Agent QA Gates

A field-tested validation system for AI agent output. Born from production failures, not theory.

Quick Start

Before any agent delivers output, run the Pre-Ship Checklist:

Accurate? — every number/date/metric has a source. Unsourced → prefix "estimated"
Complete? — no missing pieces, no "I'll do that next"
Actionable? — ends with clear next step or decision point
Fits the channel? — check character limits for your delivery surface
No leaks? — no internal context, private data, or secrets
Not a duplicate? — verify no recent identical send
Would the human be embarrassed? — if yes, don't ship

Gate Tiers

Four ascending tiers by risk level:

Gate	Scope	Key Checks
Gate 0	Internal (files, config, memory)	Mechanism changed not just text, no placeholders, file exists
Gate 1	Human-facing (briefings, summaries)	Key info in first 2 lines, ≤3-line paragraphs, channel length limits
Gate 2	External (email, public content, client materials)	No internal context leaked, recipient-appropriate tone, dedup check
Gate 3	Code & technical	Builds clean, no secrets in code, error handling, tests pass

See references/gates-detail.md for full gate checklists.

Severity Classification

Not all failures are equal:

🔴 BLOCK — cannot ship (secrets, privacy, hallucinated data, wrong recipient)
🟡 FIX — fix before shipping, <2 min (formatting, too long, missing citation)
🟢 NOTE — log and ship (style preference, minor optimization)

Protocol Gates

Recurring failure modes need dedicated gates. These are the most common:

Heartbeat / Periodic Check Output

Binary output: alert text ONLY or status-OK ONLY. Never mixed.
Every data point verified by current-session tool call. No hallucinated metrics.
No stale data from previous cycles or pre-compaction sessions.

Post-Compaction / Context Reset

Do not trust facts from the pre-reset session — verify from files and tools.
Rerun pending checks from scratch.
Zero carryover for periodic checks.

Scheduled Job / Cron Changes

Explicit timeout set
Explicit model set
Verify schedule after creation
Output fits destination channel limits

Sub-Agent Output Review

Does output match the brief's success criteria?
Any uncertainty flags unresolved?
Is the reasoning (not just the conclusion) sound?

Gate Evolution

Gates should evolve based on real failures, not imagination:

When a failure occurs → log it with root cause
Same failure class occurs 2+ times → add a gate item
Monthly: prune gates that haven't caught anything in 60 days

Anti-Patterns

Gates that sound good but never catch anything → kill them
Per-agent checklists that duplicate general gates → merge or reference
"ADHD-friendly" or "high-quality" as gate items → not testable, replace with mechanical checks
Aspirational gates nobody runs → either automate or cut

Adapting to Your System

This skill provides the pattern. Adapt it:

Start with the Pre-Ship Checklist — it works for any agent system
Add Protocol Gates for your top 3 recurring failure modes
Set channel limits for your delivery surfaces
Map real failures to gates — if a failure isn't gated, add the gate
Kill gates that never fire — a shorter, sharper checklist wins

For the full reference implementation, see references/gates-detail.md. For automation scripts, see scripts/qa-check.sh.

Agent QA Gates

A field-tested validation system for AI agent output. Born from production failures, not theory.

Quick Start

Before any agent delivers output, run the Pre-Ship Checklist:

Accurate? — every number/date/metric has a source. Unsourced → prefix "estimated"
Complete? — no missing pieces, no "I'll do that next"
Actionable? — ends with clear next step or decision point
Fits the channel? — check character limits for your delivery surface
No leaks? — no internal context, private data, or secrets
Not a duplicate? — verify no recent identical send
Would the human be embarrassed? — if yes, don't ship

Gate Tiers

Four ascending tiers by risk level:

Gate	Scope	Key Checks
Gate 0	Internal (files, config, memory)	Mechanism changed not just text, no placeholders, file exists
Gate 1	Human-facing (briefings, summaries)	Key info in first 2 lines, ≤3-line paragraphs, channel length limits
Gate 2	External (email, public content, client materials)	No internal context leaked, recipient-appropriate tone, dedup check
Gate 3	Code & technical	Builds clean, no secrets in code, error handling, tests pass

See references/gates-detail.md for full gate checklists.

Severity Classification

Not all failures are equal:

🔴 BLOCK — cannot ship (secrets, privacy, hallucinated data, wrong recipient)
🟡 FIX — fix before shipping, <2 min (formatting, too long, missing citation)
🟢 NOTE — log and ship (style preference, minor optimization)

Protocol Gates

Recurring failure modes need dedicated gates. These are the most common:

Heartbeat / Periodic Check Output

Binary output: alert text ONLY or status-OK ONLY. Never mixed.
Every data point verified by current-session tool call. No hallucinated metrics.
No stale data from previous cycles or pre-compaction sessions.

Post-Compaction / Context Reset

Do not trust facts from the pre-reset session — verify from files and tools.
Rerun pending checks from scratch.
Zero carryover for periodic checks.

Scheduled Job / Cron Changes

Explicit timeout set
Explicit model set
Verify schedule after creation
Output fits destination channel limits

Sub-Agent Output Review

Does output match the brief's success criteria?
Any uncertainty flags unresolved?
Is the reasoning (not just the conclusion) sound?

Gate Evolution

Gates should evolve based on real failures, not imagination:

When a failure occurs → log it with root cause
Same failure class occurs 2+ times → add a gate item
Monthly: prune gates that haven't caught anything in 60 days

Anti-Patterns

Gates that sound good but never catch anything → kill them
Per-agent checklists that duplicate general gates → merge or reference
"ADHD-friendly" or "high-quality" as gate items → not testable, replace with mechanical checks
Aspirational gates nobody runs → either automate or cut

Adapting to Your System

This skill provides the pattern. Adapt it:

Start with the Pre-Ship Checklist — it works for any agent system
Add Protocol Gates for your top 3 recurring failure modes
Set channel limits for your delivery surfaces
Map real failures to gates — if a failure isn't gated, add the gate
Kill gates that never fire — a shorter, sharper checklist wins

For the full reference implementation, see references/gates-detail.md. For automation scripts, see scripts/qa-check.sh.

Agent QA Gates

Agent QA Gates

Quick Start

Gate Tiers

Severity Classification

Protocol Gates

Heartbeat / Periodic Check Output

Post-Compaction / Context Reset

Scheduled Job / Cron Changes

Sub-Agent Output Review

Gate Evolution

Anti-Patterns

Adapting to Your System

Download

Skill Info

Agent QA Gates

Agent QA Gates

Quick Start

Gate Tiers

Severity Classification

Protocol Gates

Heartbeat / Periodic Check Output

Post-Compaction / Context Reset

Scheduled Job / Cron Changes

Sub-Agent Output Review

Gate Evolution

Anti-Patterns

Adapting to Your System

Download

Skill Info