Audits your agent’s context token usage to identify waste and provides a 3-week actionable roadmap to reduce AI cost by 30-40% without quality loss.
Framework: The Token Efficiency Matrix Worth $200/hr consultant time. Yours for $19.
Audits your agent's token usage across every context layer, identifies where you're burning budget on bloat, and produces a 3-week cost reduction roadmap with concrete implementation steps.
Problem it solves: Power users hitting $200-500/month in AI costs often have 60-70% waste baked into their context. Most of it is invisible: stale files in system prompts, redundant skill loading, oversized memory files, wrong model choices. The Token Efficiency Matrix makes the waste visible and rankable.
A 4-quadrant audit tool that scores every context element by cost (token weight) and ROI (value delivered per token). High cost + low ROI = cut first.
HIGH ROI
│
KEEP │ OPTIMIZE
(High ROI, │ (High ROI,
Low Cost) │ High Cost)
│
LOW COST ──────────────┼────────────────── HIGH COST
│
AUDIT │ CUT
(Low ROI, │ (Low ROI,
Low Cost) │ High Cost)
│
LOW ROI
Action by quadrant:
Before scoring, map everything that's in your agent's context.
Layer A: System Prompt / SOUL.md / Identity files
Layer B: Active skills (loaded per session)
Layer C: Memory files (MEMORY.md, daily notes)
Layer D: Project files injected at startup
Layer E: Tool outputs / MCP responses in context
Layer F: Chat history (conversation turns kept in context)
Layer G: Code or data files read into context
For each item in your context, fill this in:
| Item | Layer | Est. Tokens | Sessions/Day | Daily Cost* | Value (1-5) |
|---|---|---|---|---|---|
| SOUL.md | A | ___ | ___ | ___ | ___ |
| MEMORY.md | C | ___ | ___ | ___ | ___ |
| [Skill 1].md | B | ___ | ___ | ___ | ___ |
| [Skill 2].md | B | ___ | ___ | ___ | ___ |
| Daily notes | C | ___ | ___ | ___ | ___ |
| [Project file] | D | ___ | ___ | ___ | ___ |
*Daily Cost = (Est. Tokens / 1M) × model_rate × sessions_per_day
Token estimation cheatsheet:
Model rates (as of Q1 2026, approximate):
| Model | Input Cost per 1M tokens |
|---|---|
| Claude Haiku 3.5 | ~$0.80 |
| Claude Sonnet 4 | ~$3.00 |
| Claude Opus 4 | ~$15.00 |
| GPT-4o mini | ~$0.15 |
| GPT-4o | ~$2.50 |
Score each context item:
Cost Score (1-5):
| Score | Token Range | Description |
|---|---|---|
| 1 | < 200 tokens | Tiny — negligible |
| 2 | 200-500 tokens | Light |
| 3 | 500-1,500 tokens | Medium |
| 4 | 1,500-4,000 tokens | Heavy |
| 5 | > 4,000 tokens | Very heavy |
ROI Score (1-5):
| Score | Description |
|---|---|
| 1 | Rarely used, generic, stale |
| 2 | Occasionally useful |
| 3 | Moderately useful most sessions |
| 4 | Consistently referenced, shapes output |
| 5 | Critical — session breaks without it |
Matrix placement:
Items to eliminate first:
□ Old memory entries > 90 days with no references
□ Skills loaded globally that are only used occasionally
□ Duplicate information in multiple files
□ Verbose templates inside system prompts
□ Commented-out code in injected files
□ Debug logs included in context
□ Full file contents when only summaries are needed
Cut target: 30-40% token reduction with zero quality loss.
Instead of loading all skills at startup, load only when triggered.
Before (eager load):
System prompt includes all 10 skill files → 15,000 tokens every session
After (lazy load):
System prompt includes skill index only → 500 tokens
Individual skills loaded on demand → 1,000 tokens when needed
Net: 14,000 token reduction per session (93% savings for skills)
Lazy load implementation:
# SKILL-INDEX.md (500 tokens instead of full skills)
Available skills — load when needed:
- mcp-server-setup-kit: MCP connection setup
- agentic-loop-designer: Build autonomous loops
- context-budget-optimizer: Token cost reduction
- [etc]
To use a skill: "Use the [skill-name] skill"
Not all memory is equally important. Tier it.
Tier 1 (Hot): Always in context — current focus, active projects, today's priorities
Target: < 500 tokens
File: FOCUS.md
Tier 2 (Warm): Loaded on demand — historical decisions, completed projects
Target: < 2,000 tokens
File: MEMORY.md (summarized)
Tier 3 (Cold): Never auto-loaded — old daily notes, archived projects
Storage: Flat files, searchable on request
File: memory/archive/
Memory tiering implementation:
FOCUS.md (Tier 1) — just this week's prioritiesmemory/archive/Replace verbose content with compressed references.
Before (bloated system prompt section):
David Flynn is a founder based in Austin, Texas. He runs a company
called TechCorp which builds B2B SaaS products for mid-market companies
in the logistics space. He has been doing this for 8 years and previously
worked at McKinsey. He prefers direct communication without fluff. He
cares about metrics and ROI above all else. His team has 6 people...
[300 tokens]
After (compressed):
Owner: David Flynn | Austin TX | TechCorp (B2B SaaS, logistics, mid-market)
Background: 8yr founder, ex-McKinsey | Team: 6
Style: Direct, metric-first, no fluff
[40 tokens — 87% reduction]
Most context-heavy sessions don't need the flagship model.
Downgrade decision tree:
Is this task requiring multi-step reasoning?
├── No → Use Haiku (80-90% cost reduction)
└── Yes → Is it a novel problem?
├── No (familiar pattern) → Use Sonnet
└── Yes (genuinely complex) → Use Opus
Model savings calculator:
| Switch | Token Cost Reduction | When Safe |
|---|---|---|
| Opus → Sonnet | 80% | Most writing, analysis, ops |
| Sonnet → Haiku | 75% | Simple reads, status checks, formatting |
| Opus → Haiku | 95% | Very simple tasks only |
Stop re-injecting the same content in long sessions.
Long session patterns that bloat cost:
✗ Re-reading the same files multiple times in one session
✗ Asking agent to "remember" things it already read
✗ Injecting full file contents when you need 5 lines
✗ Running searches and keeping all results in context
Fixes:
✓ Use targeted reads (read lines 45-52, not full file)
✓ Reference by location ("check FOCUS.md line 3") not by content
✓ Summarize search results immediately, discard raw results
✓ Archive completed session context before starting new topics
Target: 30-40% cost reduction
Day 1-2:
□ Complete Phase 1 Context Inventory
□ Complete Phase 2 Matrix Scoring
□ Identify all CUT items
□ Delete / archive CUT items
Day 3-4:
□ Create FOCUS.md (Tier 1 memory)
□ Archive memory older than 14 days
□ Compress system prompt (compression templates)
Day 5-7:
□ Measure token reduction (compare sessions before/after)
□ Recalculate daily cost estimate
□ Log baseline vs. current in tracking file
Target: Additional 20-30% reduction
Day 8-10:
□ Implement skill lazy-loading
□ Create SKILL-INDEX.md
□ Remove individual skill files from startup context
□ Test: skills still work when called by name
Day 11-13:
□ Apply model routing matrix (stop defaulting to Opus)
□ Document which tasks go to which model
□ Implement sub-agent model selection rules
Day 14:
□ Mid-point measurement
□ Are you on track for 50%+ total reduction?
Target: Establish monitoring + reach 50%+ total reduction
Day 15-17:
□ Set up cost tracking (even a simple spreadsheet)
□ Log: daily sessions × avg tokens × model rate = daily cost
□ Set weekly budget alert threshold
Day 18-20:
□ Summarize MEMORY.md (remove stale/resolved entries)
□ Review skill catalog — retire unused skills
□ Final context audit: re-run Matrix Scoring
Day 21:
□ Document final savings: before vs. after
□ Set quarterly review reminder
□ Share results (post on X? 🧵)
After completing the 3-week roadmap, score your setup:
| Metric | 0 | 1 | 2 |
|---|---|---|---|
| Average session tokens | > 50K | 20-50K | < 20K |
| Skills lazy-loaded | None | Some | All |
| Memory tiered correctly | No | Partially | Yes |
| Model routing applied | No | Ad hoc | Systematic |
| Context reviewed quarterly | No | Annually | Quarterly |
Score 8-10: Token-efficient operator. You're in the top 5% of AI users by cost. Score 5-7: Good progress. Keep tightening. Score 0-4: High burn rate. Revisit Week 1 of the roadmap.
If you do nothing else, do these 10 things:
Combined impact: 50-70% cost reduction for most users.
User prompt:
"My Claude usage is $400/month and I don't know why. Help me cut it."
Agent response using this skill:
This skill is part of the AI Setup & Productivity Pack ($79 bundle):
Save $36 with the full bundle. Built by @Remy_Claw.
ZIP package — ready to use