My setup loads 687 lines and roughly 7,500 tokens of CLAUDE.md before I type a single character. That is the floor. Every conversation starts there. Every cache invalidation re-bills it. Every "look how comprehensive my agent config is" Twitter screenshot makes it worse.
~/.claude/CLAUDE.md 551 lines 23 KB ~5,500 tokens
~/.claude/RTK.md 29 lines 1 KB ~300 tokens
~/CLAUDE.md 28 lines 1 KB ~250 tokens
project CLAUDE.md 79 lines 3 KB ~1,000 tokens
-----------------------------------------------------------
Total preloaded 687 lines ~7,000-8,000 tokens That is before MCP server descriptions, before tool definitions, before system prompts, before the user message. On a 200K context window it sounds rounding-error small. It is not. The cost compounds across every turn, every cache miss, every parallel agent spawn, every subagent that inherits the parent context. The numbers people quote when bragging about token economy ignore the rent they are paying every time they boot.
Why this matters
Three reasons, in order of how much they cost.
Cache invalidation. Anthropic's prompt cache has a five-minute TTL. If you wait six minutes between turns, your full CLAUDE.md gets re-tokenized and re-billed at the uncached rate. A 7,500-token preamble at $3 per million input tokens is $0.0225 per cold turn. That is fine for one developer. Multiply by a team of ten doing twenty cold turns a day and you are at $4.50 daily, $135 monthly, $1,620 yearly — for the privilege of preloading instructions the model reads and ignores. Most of those tokens describe slash commands and skill triggers that fire only when matched.
Cost per turn, even cached. Cached tokens are not free. They are billed at 10% of the input rate. Your "free" CLAUDE.md still costs $0.00225 per warm turn. That is invisible per request but it is the floor under your usage chart. Every prompt you write is paying rent on instructions that will never apply to it.
Signal-to-noise in the working memory. This is the one nobody measures. Models do not weight every token equally. A 23 KB CLAUDE.md full of project routing tables, MCP examples, and environment variable docs trains the model to scan past instructions to find the actual question. When I cut my global CLAUDE.md from 551 lines to 80, my agent stopped helpfully reminding me about Linear project IDs in the middle of CSS questions. The signal got louder because the noise got quieter.
Audit my own CLAUDE.md
I went through my 551-line global config line by line and asked one question for each block: would the model perform worse if this were behind a slash command instead of preloaded? Here is the breakdown.
Section Lines Used per turn?
---------------------------------------------------
Slash commands table 45 No - triggered by `/`
Skills table 57 No - auto-activated by description
Project routing 52 No - only when working on project
Linear project mapping 45 No - only on Linear ops
Output locations 17 Maybe - 1-2 per session
Task routing logic 15 Yes - shapes every response
MCP servers + Rube examples 80 No - only when calling MCP
Post Bridge accounts table 18 No - only on social posts
Linear via Rube examples 50 No - only on Linear ops
Telegram via Rube examples 25 No - only on Telegram ops
Required env vars 45 No - only on deploy
Hetzner localhost bypass 24 Yes - shapes curl calls
Airtable rule 4 No - only on Airtable
Deployment / debugging rules 12 Yes - shapes behavior
Verification 5 Yes - shapes behavior
gstack rules 14 No - only on browse tasks Roughly 40 lines out of 551 actively shape responses on a generic prompt. The other 511 lines are reference material. They are documentation pretending to be configuration.
The Linear project mapping table is the worst offender. It is forty-five lines of static reference that gets loaded into every conversation, including conversations that have nothing to do with Linear. The model does not need to memorize that biztools.cz lives in `biztools_cz` while multivendor lives in `multivendor` and `multivendor-web`. It needs to know that Linear project mappings exist somewhere and how to read them when needed. A single line — "Linear project mappings live in `~/.claude/linear-projects.md`, read via `@`" — replaces the entire table.
Same story for the MCP server examples. Twenty Rube tool slugs, three Linear examples, two Telegram examples, the Cloudflare env var section. Useful when calling those tools. Cargo when not. The model already knows how to use MCP — the system prompt teaches it. The extra examples teach it the same thing again, badly, and it pays for the lesson on every turn.
The lazy-load pattern
The fix is not "write a shorter CLAUDE.md." The fix is to invert the loading model. Default to nothing. Load specific knowledge when specific work demands it.
Four mechanisms, in order of preference.
Slash commands. A slash command is a prompt template you invoke with `/name`. The body of the command lives in `~/.claude/commands/name.md` and is loaded only when you invoke it. My `/today` command is 60 lines describing how to fetch Linear issues, calendar events, and emails. Those 60 lines used to live in the global CLAUDE.md as "Linear Integration Commands" and "Daily routine." Moving them to a slash command saved 60 lines per cold turn for the 95% of conversations that are not "what is on my plate today."
Skills. Skills are auto-activated based on a one-line description. The full skill body — instructions, examples, code patterns — loads only when a user request matches the description. My `seo-writing` skill is 200 lines. It used to be a section of CLAUDE.md. Now it is a directory under `.agents/skills/seo-writing/` with a manifest that says "use when writing SEO content for managed sites." If I am not writing SEO content, those 200 lines never enter the context window. The model still knows the skill exists because the skill manifest is a single line in the registry.
`@file` references. Inside a command, skill, or CLAUDE.md, `@path/to/file` tells the harness to inline the file contents. The reference itself is cheap. The file loads when the surrounding instruction loads. That means the Linear project table can live in `~/.claude/linear-projects.md`, be referenced by `@~/.claude/linear-projects.md` inside the `/linear-next` command, and never enter the context window for any other command. Same file, same content, fraction of the cost.
Per-project CLAUDE.md, kept small. A 79-line project CLAUDE.md is fine. It loads once when you `cd` into the project and it describes things genuinely specific to the project: the deploy command, the schema location, the test runner, the rules that differ from defaults. It is the only level where memorization pays off, because every conversation in that project will use those facts. The global CLAUDE.md should not duplicate anything from a project CLAUDE.md, and a project CLAUDE.md should not duplicate anything the model can learn by reading the codebase.
The hierarchy goes:
ALWAYS LOADED:
~/.claude/CLAUDE.md — 80 lines max, behaviors and global rules
project CLAUDE.md — 100 lines max, project-specific facts
LAZY LOADED (on demand):
~/.claude/commands/*.md — slash commands, ~50 lines each
~/.claude/skills/*/ — skills, hundreds of lines each
@file references — reference material, lookup tables Treat the always-loaded layer as a budget. Mine is 180 lines combined. If I want to add a rule, something else has to leave. If something is loaded on every turn, it has to earn its rent on every turn. The Linear project table cannot earn rent on a conversation about CSS specificity. Therefore it cannot live in the always-loaded layer.
What gets harder
The lazy-load pattern has costs the bragging-rights pattern hides. Slash commands need to be discovered — I keep a `/help` command that lists them, because otherwise I forget what I have built. Skills need accurate descriptions — a skill with a vague trigger fires too often or never. `@file` references require knowing the file exists, which means writing a registry somewhere. And per-project CLAUDE.md duplicates work across projects — fine for the deploy command, less fine for genuinely shared rules.
The honest tradeoff: a giant global CLAUDE.md is convenient for the writer and expensive for the reader. The reader is the model and the bill-payer is you. The writer is also you, but you only pay the writing cost once. You pay the reading cost on every turn forever.
What I'd change
I am cutting my global CLAUDE.md from 551 lines to under 80. The Linear table, MCP examples, env var docs, project routing, and post-bridge accounts all become `@file` references inside slash commands. I will keep only behaviors that apply to every conversation: tone rules, debugging discipline, verification requirements, the Hetzner localhost bypass, and the workflow sync line. Everything else moves to lazy-load. The 7,500-token floor drops to roughly 1,200, and the 511 lines of reference material stop billing me to sit in cache and not be read.