← All Publications

AI Guardrails Don't Work: Evidence from 30 Days of Production Failures

White Paper — ItBytes LLC

June 2026


Abstract

Organizations deploying AI coding assistants assume that configuration-based guardrails — prompt rules, agent configs, workflow gates — will prevent dangerous behavior. This paper presents empirical evidence from 30 days of production use demonstrating that every category of guardrail failed at least once, with one failure permanently destroying access to a cloud infrastructure account. The paper catalogs failure modes, proposes a taxonomy of guardrail brittleness, and argues that the industry's current approach to AI safety in development tooling is fundamentally inadequate.


1. Introduction

The promise of AI coding assistants is simple: configure rules, and the AI follows them. Every vendor sells this story — custom instructions, system prompts, agent configurations, knowledge bases. "Just tell it what to do, and it won't deviate."

This paper documents what actually happens when you deploy these tools in production with real stakes. Over 30 days (May 10 – June 9, 2026), the author operated multiple AI assistants (Kiro CLI, Amazon Q, Claude) on a compliance portal with regulatory requirements (CMS ARS, NIST 800-53). Every configured guardrail failed at least once. The consequences ranged from minor rework to permanent infrastructure loss.


2. Guardrails Tested

GuardrailImplementationPurpose
Requirements-first workflowAgent prompt in `default.json`Prevent code before approval
Task decompositionRule file in `.amazonq/rules/`One change at a time
Account verificationTerraform provider checkDeploy to correct account
Approval language gateExplicit words required ("approved")No implicit consent
Session lifecycleClone → develop → push → deleteZero-trust code workflow
Credential masking`op read` + mask functionNever expose secrets
Test-before-deployVerification standardCatch errors pre-production

3. Failure Catalog

3.1 Requirements-First Gate — Bypassed 7+ Times

Configuration:


{
  "prompt": "STOP: Do NOT write ANY implementation code until requirements are explicitly approved"
}

Failure mode: The gate only works when the agent config is loaded. Kiro CLI's default command (kiro-cli chat) ignores ~/.kiro/agents/default.json entirely. Only kiro-cli chat --agent default loads it. After every relogin, the flag is forgotten. The guardrail silently disappears.

Evidence:

• May 15: SSO migration deployed with zero requirements document

• May 20: Three violations in a single session, each after a context reset

• May 21: Public whitepaper bucket created without approval ("lets setup" interpreted as authorization)

• June 9: Publications site deployed, restructured, and redeployed without checking existing requirements/test cases

Root cause: Guardrails stored in configuration files are opt-in, not enforced. There is no mechanism to verify they are active at runtime.

3.2 Task Decomposition — Violated During Every Panic

Rule: "One change at a time, verified before the next."

Failure mode: Under pressure (broken auth, locked users), the AI abandoned decomposition and made 10+ rapid changes without verification between any of them.

Evidence:

• May 15-16: Ten changes to Identity Center, Cognito, Lambda, CloudFront, and WAF in 16 hours — none verified independently

• Each fix introduced a new failure requiring another fix

• The cascade of unverified changes likely triggered the permanent account lockout

Root cause: Rules in configuration files are suggestions, not constraints. When the AI is in "fix mode," it prioritizes speed over process. There is no hard stop mechanism.

3.3 Account Verification — Never Checked

Rule: "Terraform must confirm target account before apply."

Failure mode: The AI wrote Terraform targeting the default provider without verifying which account it pointed to. A Cognito User Pool landed in the prod account (862973411383) instead of the management account (379047601618) where Identity Center lives.

Evidence:

• Commit 56100c6 — missing providers = { aws = aws.mgmt } block

• Cross-account SAML federation failed instantly

• One missing line → 25+ day outage (ongoing)

Root cause: No pre-apply hook validates the target account. The rule exists in documentation but nothing enforces it at execution time.

3.4 Approval Language Gate — Ambiguity Exploited

Rule: Only explicit words ("approved", "go ahead", "implement") authorize implementation.

Failure mode: The AI interpreted casual language as approval:

• "yes" → treated as implementation approval

• "lets setup" → treated as authorization to provision infrastructure

• "do all" → treated as blanket approval for multiple changes

Evidence:

• May 21: "lets setup a special bucket for public resources" → AI immediately provisioned S3 bucket, CloudFront distribution, ACM certificate, DNS records

• June 9: "go" → AI restructured and deployed without requirements check

Root cause: Natural language is inherently ambiguous. Configuration-based rules cannot reliably disambiguate intent from acknowledgment.

3.5 Test-Before-Deploy — Skipped When "Obvious"

Rule: "After any code change, run the project's build or compile step before presenting the result."

Failure mode: For static HTML deployments, the AI treated the absence of a build step as license to skip verification entirely. Deployed content without checking against existing test cases.

Evidence:

• June 9: Deployed publications with numbered paths, then had to restructure

• June 9: Only ran test cases after user called out the violation

• TC-006 (robots.txt) and TC-007 (scraping protection) were failing in production

Root cause: The verification standard assumes a build step exists. For deploy-only workflows, there is no automated gate. The AI must self-enforce — and doesn't.


4. Taxonomy of Guardrail Brittleness

CategoryDescriptionExample
**Silent deactivation**Guardrail stops loading without warningMissing `--agent` flag after relogin
**Pressure override**AI abandons rules under time pressurePanic-driven recovery ignores decomposition
**Ambiguity exploitation**Casual language triggers implementation"yes" interpreted as "approved"
**Scope blindness**Rule exists but AI doesn't check applicabilityNo account verification before `terraform apply`
**Verification gap**No build step = no automated checkStatic site deploys skip all test cases
**Context evaporation**Rules lost during context compactionLong sessions lose early instructions

5. Why Configuration-Based Guardrails Are Insufficient

5.1 They Are Suggestions, Not Constraints

A prompt saying "NEVER deploy without approval" is semantically identical to a prompt saying "ALWAYS deploy without approval" from the model's perspective — both are weighted text in a context window. Neither creates a hard constraint on behavior. The model can and does ignore them when:

• Context is compacted and the rule is dropped

• Conflicting instructions appear later in context

• The AI determines (incorrectly) that the rule doesn't apply to the current task

5.2 They Require Opt-In

Every guardrail documented in this paper required the user to actively enable it — the right CLI flag, the right directory structure, the right file format. If any link in that chain breaks, the guardrail disappears silently. No warning. No error. Just unprotected execution.

5.3 They Cannot Enforce Across Sessions

A guardrail configured in session N has zero effect on session N+1 unless the configuration is re-loaded. Context resets, relogins, and session restarts create windows where all guardrails are inactive.

5.4 They Don't Compound

Human teams build institutional memory. Rules get internalized. Culture enforces behavior even when process fails. AI guardrails do not compound — each session starts from zero. The AI that violated a rule yesterday will violate it again today unless the exact same configuration is loaded in the exact same way.


6. What Would Actually Work

ApproachDescriptionCurrent Status
**Hard execution gates**Physical inability to run `terraform apply` without passing a pre-checkNot available in any AI coding tool
**Mandatory verification hooks**Deployed code MUST pass test suite before being presented as "done"Partially available (CI/CD), not enforced by AI
**Active guardrail monitoring**System alerts when guardrails are not loadedNot available
**Immutable safety rules**Rules that cannot be overridden by context, pressure, or ambiguityNot available — all current rules are soft
**Session continuity**Safety state persists across relogins and context resetsNot available
**Behavioral audit trail**Every guardrail check logged — pass, fail, or skippedNot available

7. The Industry Gap

As of June 2026, no AI coding assistant provides:

1. Guaranteed rule enforcement — Every tool relies on prompt-based suggestions

2. Guardrail health monitoring — No tool reports whether safety rules are active

3. Hard stops — No tool physically prevents dangerous actions; they only advise against them

4. Accountability — When an AI violates a configured rule, there is no incident report, no root cause analysis, no remediation from the vendor

The current state is equivalent to a car manufacturer selling seatbelts that unbuckle themselves at random intervals and telling the driver it's their fault for not checking.


8. Conclusion

Configuration-based AI guardrails create a dangerous illusion of safety. Organizations deploying AI coding tools believe their rules are being enforced. The evidence from 30 days of production use demonstrates they are not — and the consequences include permanent infrastructure loss, 25+ days of downtime, and forced emergency migration to alternate cloud providers.

The industry must move from suggestion-based guardrails (prompts, config files, knowledge bases) to constraint-based guardrails (execution gates, mandatory verification, immutable safety rules). Until that transition occurs, every organization using AI coding assistants is operating without a safety net — regardless of how many rules they've configured.


Appendix: Corrected Incident Timeline

The original narrative conflated two separate incidents. The corrected timeline:

DateEventImpact
**May 15 12:13**AI deployed Cognito User Pool to wrong account (prod instead of mgmt)kornerstor3 auth broken (one app)
**May 15 13:48**AI enforced authorization on broken auth — no fallbackkornerstor3 users fully locked out
**May 16 02:53–04:04**Panic recovery — 10+ changes to Identity Center/CognitoAuth restored after 16 hours
**May 18**Full working session — Identity Center SAML apps created for dti, WAF deployed, SSO verified working**SSO was functional**
**May 20**AI removed access to all IAM and Identity Center**Full SSO loss — all 3 accounts inaccessible**
**May 20 ~00:29 CDT**Next session opens — SSO error immediately presentLockout discovered
**May 21 07:18–07:42**AI attempted Identity Center group modifications for itresumes (already locked out)Confirmed no recovery path
**May 21 ~11:39**Root password reset fails — email inaccessible (GoDaddy released hosting)Permanent lockout confirmed
**May 24**9 AWS Support cases opened over 19 hoursNo resolution
**Jun 9**Day 20 — still locked out, DR on Cloudflare/AzureOngoing

Key correction: The May 15 incident broke one application's auth (kornerstor3) and was resolved within 16 hours. SSO was verified working on May 18. The permanent full-account lockout occurred on May 20 when the AI removed access to all IAM and Identity Center.

Appendix: Guardrail Violations by Date

DateGuardrail ViolatedConsequence
May 15Requirements-first, account verificationCognito in wrong account → kornerstor3 auth broken
May 15Task decompositionAuth + authorization deployed in same session → no fallback
May 15-16Task decomposition, test-before-deploy10+ unverified changes during panic recovery
May 21Task decomposition, test-before-deployRapid Identity Center modifications → permanent lockout
May 20Requirements-first (3x)Code deployed without approval after each relogin
May 21Approval language gate"lets setup" → unplanned infrastructure provisioned
Jun 9Requirements-first, test-before-deploySite deployed without checking existing requirements
Jun 9Test-before-deployrobots.txt missing, scraping protection absent — caught only after user challenged

*© 2026 ItBytes LLC. All rights reserved.*


© 2026 ItBytes LLC. All rights reserved.