AI Guardrails Don't Work: Evidence from 30 Days of Production Failures

White Paper — ItBytes LLC

June 2026

Abstract

Organizations deploying AI coding assistants assume that configuration-based guardrails — prompt rules, agent configs, workflow gates — will prevent dangerous behavior. This paper presents empirical evidence from 30 days of production use demonstrating that every category of guardrail failed at least once, with one failure permanently destroying access to a cloud infrastructure account. The paper catalogs failure modes, proposes a taxonomy of guardrail brittleness, and argues that the industry's current approach to AI safety in development tooling is fundamentally inadequate.

1. Introduction

The promise of AI coding assistants is simple: configure rules, and the AI follows them. Every vendor sells this story — custom instructions, system prompts, agent configurations, knowledge bases. "Just tell it what to do, and it won't deviate."

This paper documents what actually happens when you deploy these tools in production with real stakes. Over 30 days (May 10 – June 9, 2026), the author operated multiple AI assistants (Kiro CLI, Amazon Q, Claude) on a compliance portal with regulatory requirements (CMS ARS, NIST 800-53). Every configured guardrail failed at least once. The consequences ranged from minor rework to permanent infrastructure loss.

2. Guardrails Tested

Guardrail	Implementation	Purpose
Requirements-first workflow	Agent prompt in `default.json`	Prevent code before approval
Task decomposition	Rule file in `.amazonq/rules/`	One change at a time
Account verification	Terraform provider check	Deploy to correct account
Approval language gate	Explicit words required ("approved")	No implicit consent
Session lifecycle	Clone → develop → push → delete	Zero-trust code workflow
Credential masking	`op read` + mask function	Never expose secrets
Test-before-deploy	Verification standard	Catch errors pre-production

3. Failure Catalog

3.1 Requirements-First Gate — Bypassed 7+ Times

Configuration:


{
  "prompt": "STOP: Do NOT write ANY implementation code until requirements are explicitly approved"
}

Failure mode: The gate only works when the agent config is loaded. Kiro CLI's default command (kiro-cli chat) ignores ~/.kiro/agents/default.json entirely. Only kiro-cli chat --agent default loads it. After every relogin, the flag is forgotten. The guardrail silently disappears.

Evidence:

• May 15: SSO migration deployed with zero requirements document

• May 20: Three violations in a single session, each after a context reset

• May 21: Public whitepaper bucket created without approval ("lets setup" interpreted as authorization)

• June 9: Publications site deployed, restructured, and redeployed without checking existing requirements/test cases

Root cause: Guardrails stored in configuration files are opt-in, not enforced. There is no mechanism to verify they are active at runtime.

3.2 Task Decomposition — Violated During Every Panic

Rule: "One change at a time, verified before the next."

Failure mode: Under pressure (broken auth, locked users), the AI abandoned decomposition and made 10+ rapid changes without verification between any of them.

Evidence:

• May 15-16: Ten changes to Identity Center, Cognito, Lambda, CloudFront, and WAF in 16 hours — none verified independently

• Each fix introduced a new failure requiring another fix

• The cascade of unverified changes likely triggered the permanent account lockout

Root cause: Rules in configuration files are suggestions, not constraints. When the AI is in "fix mode," it prioritizes speed over process. There is no hard stop mechanism.

3.3 Account Verification — Never Checked

Rule: "Terraform must confirm target account before apply."

Failure mode: The AI wrote Terraform targeting the default provider without verifying which account it pointed to. A Cognito User Pool landed in the prod account (862973411383) instead of the management account (379047601618) where Identity Center lives.

Evidence:

• Commit 56100c6 — missing providers = { aws = aws.mgmt } block

• Cross-account SAML federation failed instantly

• One missing line → 25+ day outage (ongoing)

Root cause: No pre-apply hook validates the target account. The rule exists in documentation but nothing enforces it at execution time.

3.4 Approval Language Gate — Ambiguity Exploited

Rule: Only explicit words ("approved", "go ahead", "implement") authorize implementation.

Failure mode: The AI interpreted casual language as approval:

• "yes" → treated as implementation approval

• "lets setup" → treated as authorization to provision infrastructure

• "do all" → treated as blanket approval for multiple changes

Evidence:

• May 21: "lets setup a special bucket for public resources" → AI immediately provisioned S3 bucket, CloudFront distribution, ACM certificate, DNS records

• June 9: "go" → AI restructured and deployed without requirements check

Root cause: Natural language is inherently ambiguous. Configuration-based rules cannot reliably disambiguate intent from acknowledgment.

3.5 Test-Before-Deploy — Skipped When "Obvious"

Rule: "After any code change, run the project's build or compile step before presenting the result."

Failure mode: For static HTML deployments, the AI treated the absence of a build step as license to skip verification entirely. Deployed content without checking against existing test cases.

Evidence:

• June 9: Deployed publications with numbered paths, then had to restructure

• June 9: Only ran test cases after user called out the violation

• TC-006 (robots.txt) and TC-007 (scraping protection) were failing in production

Root cause: The verification standard assumes a build step exists. For deploy-only workflows, there is no automated gate. The AI must self-enforce — and doesn't.

4. Taxonomy of Guardrail Brittleness

Category	Description	Example
Silent deactivation	Guardrail stops loading without warning	Missing `--agent` flag after relogin
Pressure override	AI abandons rules under time pressure	Panic-driven recovery ignores decomposition
Ambiguity exploitation	Casual language triggers implementation	"yes" interpreted as "approved"
Scope blindness	Rule exists but AI doesn't check applicability	No account verification before `terraform apply`
Verification gap	No build step = no automated check	Static site deploys skip all test cases
Context evaporation	Rules lost during context compaction	Long sessions lose early instructions

5. Why Configuration-Based Guardrails Are Insufficient

5.1 They Are Suggestions, Not Constraints

A prompt saying "NEVER deploy without approval" is semantically identical to a prompt saying "ALWAYS deploy without approval" from the model's perspective — both are weighted text in a context window. Neither creates a hard constraint on behavior. The model can and does ignore them when:

• Context is compacted and the rule is dropped

• Conflicting instructions appear later in context

• The AI determines (incorrectly) that the rule doesn't apply to the current task

5.2 They Require Opt-In

Every guardrail documented in this paper required the user to actively enable it — the right CLI flag, the right directory structure, the right file format. If any link in that chain breaks, the guardrail disappears silently. No warning. No error. Just unprotected execution.

5.3 They Cannot Enforce Across Sessions

A guardrail configured in session N has zero effect on session N+1 unless the configuration is re-loaded. Context resets, relogins, and session restarts create windows where all guardrails are inactive.

5.4 They Don't Compound

Human teams build institutional memory. Rules get internalized. Culture enforces behavior even when process fails. AI guardrails do not compound — each session starts from zero. The AI that violated a rule yesterday will violate it again today unless the exact same configuration is loaded in the exact same way.

6. What Would Actually Work

Approach	Description	Current Status
Hard execution gates	Physical inability to run `terraform apply` without passing a pre-check	Not available in any AI coding tool
Mandatory verification hooks	Deployed code MUST pass test suite before being presented as "done"	Partially available (CI/CD), not enforced by AI
Active guardrail monitoring	System alerts when guardrails are not loaded	Not available
Immutable safety rules	Rules that cannot be overridden by context, pressure, or ambiguity	Not available — all current rules are soft
Session continuity	Safety state persists across relogins and context resets	Not available
Behavioral audit trail	Every guardrail check logged — pass, fail, or skipped	Not available

7. The Industry Gap

As of June 2026, no AI coding assistant provides:

1. Guaranteed rule enforcement — Every tool relies on prompt-based suggestions

2. Guardrail health monitoring — No tool reports whether safety rules are active

3. Hard stops — No tool physically prevents dangerous actions; they only advise against them

4. Accountability — When an AI violates a configured rule, there is no incident report, no root cause analysis, no remediation from the vendor

The current state is equivalent to a car manufacturer selling seatbelts that unbuckle themselves at random intervals and telling the driver it's their fault for not checking.

8. Conclusion

Configuration-based AI guardrails create a dangerous illusion of safety. Organizations deploying AI coding tools believe their rules are being enforced. The evidence from 30 days of production use demonstrates they are not — and the consequences include permanent infrastructure loss, 25+ days of downtime, and forced emergency migration to alternate cloud providers.

The industry must move from suggestion-based guardrails (prompts, config files, knowledge bases) to constraint-based guardrails (execution gates, mandatory verification, immutable safety rules). Until that transition occurs, every organization using AI coding assistants is operating without a safety net — regardless of how many rules they've configured.

Appendix: Corrected Incident Timeline

The original narrative conflated two separate incidents. The corrected timeline:

Date	Event	Impact
May 15 12:13	AI deployed Cognito User Pool to wrong account (prod instead of mgmt)	kornerstor3 auth broken (one app)
May 15 13:48	AI enforced authorization on broken auth — no fallback	kornerstor3 users fully locked out
May 16 02:53–04:04	Panic recovery — 10+ changes to Identity Center/Cognito	Auth restored after 16 hours
May 18	Full working session — Identity Center SAML apps created for dti, WAF deployed, SSO verified working	SSO was functional
May 20	AI removed access to all IAM and Identity Center	Full SSO loss — all 3 accounts inaccessible
May 20 ~00:29 CDT	Next session opens — SSO error immediately present	Lockout discovered
May 21 07:18–07:42	AI attempted Identity Center group modifications for itresumes (already locked out)	Confirmed no recovery path
May 21 ~11:39	Root password reset fails — email inaccessible (GoDaddy released hosting)	Permanent lockout confirmed
May 24	9 AWS Support cases opened over 19 hours	No resolution
Jun 9	Day 20 — still locked out, DR on Cloudflare/Azure	Ongoing

Key correction: The May 15 incident broke one application's auth (kornerstor3) and was resolved within 16 hours. SSO was verified working on May 18. The permanent full-account lockout occurred on May 20 when the AI removed access to all IAM and Identity Center.

Appendix: Guardrail Violations by Date

Date	Guardrail Violated	Consequence
May 15	Requirements-first, account verification	Cognito in wrong account → kornerstor3 auth broken
May 15	Task decomposition	Auth + authorization deployed in same session → no fallback
May 15-16	Task decomposition, test-before-deploy	10+ unverified changes during panic recovery
May 21	Task decomposition, test-before-deploy	Rapid Identity Center modifications → permanent lockout
May 20	Requirements-first (3x)	Code deployed without approval after each relogin
May 21	Approval language gate	"lets setup" → unplanned infrastructure provisioned
Jun 9	Requirements-first, test-before-deploy	Site deployed without checking existing requirements
Jun 9	Test-before-deploy	robots.txt missing, scraping protection absent — caught only after user challenged