Guide creation of AI-friendly project structures with progressive disclosure indexing, modular architecture, and intent-oriented code. Use when starting a ne...
Create codebases that AI agents can navigate efficiently through progressive disclosure — revealing only what's needed, when it's needed.
Apply this skill when the user:
Language-agnostic: applies to any language. See language-adaptation.md for per-language mappings.
When NOT to apply: Minimal single-file projects, quick prototypes, or codebases where indexing overhead outweighs benefit. Skip full L1/L2 when the project has fewer than ~5 source files; a lightweight AGENTS.md may suffice.
Like Ariadne's thread through the labyrinth, a well-indexed codebase gives AI agents a clear path from project overview to implementation detail. An agent should understand your project structure, locate any function, and assess the impact of any modification — without reading the entire codebase.
Two optimization targets:
L0 PROJECT ROOT ← AGENTS.md (80–150 lines, prefer brevity)
│ Project map, navigation table, build commands, cross-cutting patterns
│
L1 MODULE INDEX ← Each directory gets an INDEX.md (~50 lines)
│ Purpose, public API, bidirectional dependencies, contracts,
│ task routing, modification risk, test mapping
│
L2 FILE INTENT ← File headers (~5-10 lines per file)
│ What this file does, public interface, contracts, side effects
│
L3 INLINE DETAIL ← Comments on non-obvious logic only
Why, not what; tricky edge cases; business rules
AI agents start at L0 and drill down only as needed.
Create an AGENTS.md at the project root with:
INDEX.md and check Dependents before modifying public API, confirm Modification Risk for breaking changes, run Tests before finishing.cursorignore (same syntax as .gitignore) to exclude build artifacts, dependencies, generated code; common examples: node_modules/, dist/, vendor/, __pycache__/, target/, *.generated.* — adapt to your ecosystemThe cross-cutting patterns section is critical: it prevents the AI from reading 3-4 existing files just to infer "how do we handle errors here?". One 30-line declaration replaces ~1500 tokens of pattern inference.
Navigation priority: Prefer navigating via INDEX.md Dependents and Dependencies over semantic/keyword search. Structural dependency paths surface architecturally critical files that retrieval often misses. When modifying a module's public API, always consult its Dependents list first to assess blast radius.
Information placement (Lost in the Middle): LLMs attend more to context start and end. Place high-priority content (Build, Quick Navigation, Agent Workflow) at the beginning or end of AGENTS.md; keep secondary sections toward the middle.
## Architectural Constraints
<!-- Typical Web app layers; adapt to project: app/, packages/, layers/, etc. -->
- Dependency direction: presentation → domain → infrastructure (never reverse)
Example: ui/ → core/ → infra/ — or app/features/, packages/core/, etc.
- All external calls (HTTP, DB, cache) go through infrastructure layer — e.g. `src/infra/` or `[project-path]/infra/`
- No direct file I/O outside designated storage module
## Cross-Cutting Patterns
### Error Handling
- Throw typed errors (e.g. [ErrorType], [ValidationError]), never raw strings
- Log errors before re-throwing across module boundaries
- Public functions document all throwable error types in file header
### Logging
- Use structured logger from [project logger location]
- Include [trace_id or request_id] in all log entries
- Levels: ERROR (failures), WARN (degradation), INFO (operations)
### Concurrency (if applicable)
- Document thread-safety per module (single-threaded vs thread-safe)
- Define lock ordering strategy (e.g. by resource name, by hierarchy) and record in AGENTS.md
- Apply consistently to avoid deadlocks
For the full AGENTS.md template, see index-templates.md.
Every module directory gets an INDEX.md — any directory that has a public API, a barrel/__init__, or 3+ source files. This is the most information-dense layer — it must answer not just "what does this module do?" but also "who depends on it?", "what will break if I change it?", and "which file should I edit for task X?".
Example below uses domain auth; replace with your domain — billing, reporting, catalog, etc.
# Module: auth
Handle user authentication and session management.
Status: stable | Fan-in: 5 | Fan-out: 2
## Public API
| Export | File | Description |
|--------|------|-------------|
| `authenticate()` | `authenticate.*` | Verify credentials, return session |
| `authorize()` | `authorize.*` | Check permissions for resource |
| `SessionStore` | `session_store.*` | Session lifecycle management |
## Dependencies (Fan-out: 2)
<!-- Use repo-root-relative paths. Optional edge type: [imports] | [inherits] | [instantiates] -->
- `src/infra/db` — user record lookups
- `src/shared/crypto` — password hashing
## Dependents (Fan-in: 5)
<!-- Prefer structural navigation via this list over semantic search. Use repo-root-relative paths. -->
- `src/api/routes/user` → authenticate()
- `src/api/middleware/auth-guard` → authorize()
- `src/core/billing` → get_session()
- `src/core/notification` → get_user_from_session()
- `src/api/routes/admin` → authorize()
## Interface Contract
- authenticate(): returns Session (token non-empty, expires_at > now) or throws AuthError
- All failed auth attempts → audit log written before exception propagates
- Thread-safe: all public functions can be called concurrently
- Side effect: authenticate() writes to audit_log on failure
## Modification Risk
- Add field to Session → compatible, no consumer changes needed
- Change authenticate() signature → BREAKING, update 5 dependents
- Change error types → BREAKING for all specific catch blocks
## Task Routing
- Add login method → modify authenticate.*, possibly add new file
- Change session TTL → modify session_store.*, update config
- Add permission type → modify authorize.*, update Permission enum
- Fix token validation bug → modify token_validator.*
## Tests
- Unit: `tests/unit/auth/`
- Integration: `tests/integration/auth_flow_test.*`
- Critical: auth_flow_test (must pass before merge)
- Untested: token_refresh() — extra caution when modifying
Voice Coding: L1 Task Routing supports "say task → get file path" — map natural-language tasks to specific files for hands-free navigation.
Why each section matters for AI:
| Section | AI Question It Answers | Token Savings |
|---|---|---|
| Dependents | "Who calls me? What breaks if I change?" | Avoids full-project grep (~800 tokens) |
| Interface Contract | "What invariants must I preserve?" | Avoids re-reading callers to infer constraints (~400 tokens) |
| Modification Risk | "Is this change safe? How many files to update?" | Avoids under-estimating blast radius |
| Task Routing | "Which file should I edit for this task?" | Avoids reading wrong files (~600 tokens) |
| Tests | "What should I run to verify?" | Avoids searching for test files (~200 tokens) |
For the full INDEX.md template, see index-templates.md.
Every source file starts with an intent header. For public API files (called by other modules), extend the header with contract annotations:
C++ (public header with contract):
/**
* @brief Verify user credentials against stored records.
*
* @pre password is non-empty, username has passed format validation
* @post On success: returned Session.token is non-empty and unique
* @post On failure: audit log entry written before exception
* @throws AuthError if credentials are invalid
* @throws RateLimitError if attempts exceed MAX_ATTEMPTS
* @sideeffect Writes to audit_log table on failure
* @threadsafe
*
* @module auth
* @public authenticate
*/
TypeScript (public API with contract):
/**
* Verify user credentials against stored records.
*
* @pre password is non-empty, username passed format validation
* @post Success: Session.token is non-empty and unique
* @post Failure: audit log entry written before throwing
* @throws {AuthError} credentials invalid
* @throws {RateLimitError} attempts exceed MAX_ATTEMPTS
* @sideeffect Writes audit_log on failure
*
* @module auth
* @public authenticate
*/
Internal files (not called across module boundaries) keep the simple 3-5 line header. Only public API files need full contracts.
Contract annotation reference:
| Annotation | Purpose | When to Use |
|---|---|---|
@pre | What must be true before calling | Input validation assumptions |
@post | What is guaranteed after return | Return value constraints |
@throws | What errors can be thrown | All error conditions |
@sideeffect | What state changes beyond return value | DB writes, events, cache invalidation |
@threadsafe | Concurrency guarantee | Multi-threaded systems |
@performance | Latency/throughput constraints | Hot-path functions |
@lines / @range | Key line ranges for large files | When file exceeds ~300 LOC; e.g. @lines 45-89 main logic |
Path convention: In L1, use repo-root-relative paths (e.g. src/auth/authenticate.ts) — avoid ./, ../ for clarity and Voice Coding compatibility.
See index-templates.md for all language templates.
Comment ONLY non-obvious logic:
// Rate limit: 5 attempts per IP per minute (compliance FR-201)
if (attempts > MAX_ATTEMPTS) throw RateLimitError();
// Must resolve parent before children — circular refs in legacy schema
auto resolved = resolve_parent_first(node);
Never comment obvious code. AI reads code natively — comments that restate the code are pure noise.
✅ src/billing/calculate_invoice.* — does one thing
✅ src/billing/apply_discount.* — does one thing
❌ src/billing/utils.* — does many things
❌ src/helpers.* — grab bag
Typical layering (example; adapt names to your project — e.g. app/, packages/, layers/):
presentation/ → domain/ → infrastructure/
↘ shared/ (cross-cutting) ↙
| Metric | Target | Reason |
|---|---|---|
| Lines per file | < 300 (< 500 for verbose languages) | AI reads full files; smaller = less wasted context |
| Public symbols per file | 1-5 | Clear public surface |
| Dependencies per file | < 5-8 imports/includes | Limits blast radius |
The index structure works for both, but libraries differ in several key areas:
| Aspect | Application | Library |
|---|---|---|
| Build & Test | Has build + run commands | May have no build step (header-only); tests often in separate projects |
| Public API | Internal modules calling each other | Consumers are external; public API = exported headers / package interface |
| Dependents (Fan-in) | Other modules in the same project | Primarily external consumers (note "External: N/A — library" in Dependents) |
| Dependency direction | presentation → domain → infra (example) | Libraries: typically flat or layered by abstraction (e.g., core ← filtering ← testing) |
| File size targets | Strict (you control the code) | Guideline only for upstream/third-party code you do not own |
Header-only libraries (C++): no include/ + src/ split — all code is in headers. The entry header (e.g., library.hpp) acts as the barrel/__init__. There is no independent build step; document how consumers include and link.
Upstream / third-party code: When indexing a library you do not maintain (e.g., a vendored dependency), create indexes (L0 + L1) for navigation but do NOT modify source files (no L2 header changes, no refactoring oversized files). Flag issues in indexes with ⚠ UPSTREAM instead of ⚠ needs split.
Write code that expresses what it accomplishes, not how.
| Layer | Principle | C++ Example | TypeScript Example | Python Example |
|---|---|---|---|---|
| Files | descriptive, one concept | invoice_calculator.cpp | calculate-invoice.ts | invoice_calculator.py |
| Dirs | group by domain | billing/, auth/ | billing/, auth/ | billing/, auth/ |
| Functions | verb-first, outcome | calculate_total() | calculateTotal() | calculate_total() |
| Classes | PascalCase noun | InvoiceCalculator | InvoiceCalculator | InvoiceCalculator |
| Constants | SCREAMING_SNAKE | MAX_RETRY_COUNT | MAX_RETRY_COUNT | MAX_RETRY_COUNT |
| Booleans | is/has/should | is_authenticated | isAuthenticated | is_authenticated |
// ✅ Intent-oriented: tells you WHAT
MonetaryAmount calculate_invoice_total(
const std::vector<LineItem>& line_items,
const std::vector<Discount>& discounts);
// ❌ Implementation-oriented: tells you HOW
int process_items(void* items, int count, int flags);
Applies to any external interface (REST, gRPC, library headers, CLI):
For detailed conventions, see naming-api-conventions.md.
Not all documentation serves the same audience or requires the same update frequency. This skill distinguishes two tiers:
| Tier | Files | Audience | Update Timing | Staleness Impact |
|---|---|---|---|---|
| A — AI Runtime Index | AGENTS.md, INDEX.md, file headers (L2), inline comments (L3) | AI agents | Real-time — atomic with code changes | Critical: stale → wrong navigation, missed dependencies, broken contracts |
| B — Human Communication | README.md, architecture.md, ADRs, API docs, guides, changelogs | Human developers | Batch — at iteration end | Moderate: stale → human confusion, but AI unaffected |
Key insight: Tier A indexes are the AI's "working memory" — they must be updated in the same operation as the code change. Tier B documents are "reports" derived from Tier A — they can be regenerated at iteration boundaries.
During active development: AI agents only maintain Tier A files. Tier B documents are transparent to the AI — they are neither read nor written until the iteration-end sync.
Token economics: This separation reduces per-change AI overhead by ~60%, while actually improving Tier B document quality (batch updates are more coherent than incremental patches).
Human-facing documentation follows the same progressive disclosure structure, but is maintained at iteration boundaries rather than in real-time. Common structure (adapt to project; ADR is optional — omit adr/ if you don't use ADRs):
docs/
├── README.md # What is this, how to start (< 1 page)
├── architecture.md # System overview, component diagram
├── adr/ # Architecture Decision Records (optional)
│ ├── INDEX.md # ADR list with one-line summaries
│ └── 001-database.md
└── api/ # API reference (or equivalent)
├── INDEX.md # Endpoint / header summary table
└── users.md # Detailed docs
Rules:
INDEX.md) as source of truthFor doc templates and ADR format, see doc-standards.md.
The 4-level index and design principles are universal. Per-language conventions (C++, TypeScript, Python, Go, Rust, Java/Kotlin) — directory layout, file naming, L2 contract annotations, build commands — are in language-adaptation.md. Load that file when indexing a project in a specific language.
docs/ directory structure but fill content only at first iteration endPrinciple: Document first, refactor later. Create indexes for the codebase AS-IS. Flag issues (oversized files, violations) in the indexes but do not block on fixing them.
Monorepo / sub-project: If the target is a sub-directory within a larger repo, place AGENTS.md at the sub-project root (e.g., lib/AGENTS.md). Build commands should cover this sub-project only (or note the parent command). See the monorepo variant in templates.
Priority: Start with the highest Fan-in module (most depended-on) — it yields the most navigation savings.
Partial migration: If migrating a single module (not the whole project), start at L1 — create INDEX.md for the target module directly. Skip L0; assume the parent AGENTS.md will be created later or already exists.
Library projects: If the target is a library (not an application), see Library vs Application Projects for differences in Build & Test, Public API, and Dependents. For header-only C++ libraries, the entry header acts as the barrel — document it in AGENTS.md and treat the main header directory as a single module.
Upstream / third-party code: If you do not own the source code (vendored dependency, upstream library), create L0 + L1 indexes for navigation but skip L2 (do not modify source file headers) and L3 (do not modify inline comments). Flag oversized files with ⚠ UPSTREAM instead of suggesting splits.
AGENTS.md with directory map, navigation table, cross-cutting patterns, and AI Agent Instructions. Mark any discovered architectural violations as ⚠ KNOWN VIOLATION in Architectural ConstraintsINDEX.md in each major module — start with Public API + Dependencies + Dependents. Use grep -rn "from [module]" . or IDE search to discover Dependents. For flat directories with many files, use section headers within INDEX.md to organize logical groups (see L1 Retrofit Tips)⚠ 1066 lines, needs split or ⚠ UPSTREAM for code you do not own)Directive: Every code change MUST include atomic updates to relevant Tier A indexes. See index-maintenance.md for the full trigger→action table, anti-patterns, and Tier B sync workflow. When the user says "documentation sync" or "iteration end", execute the Tier B batch process.
ZIP package — ready to use