Fable is back, it's brilliant, and it will happily drain your wallet. One creator spent over $1,400 in the first four hours after release — deliberately — to stress-test which token-reduction strategies actually work. I went through the whole breakdown, and the good news is most of the savings come from a handful of simple habits. Here's what stuck with me.
The core insight running through all of it: token management is context management. Almost every dollar wasted goes to Claude reading things it didn't need to read.
1. Minify tool output (RTK)
Claude Code runs tool calls constantly — hundreds or thousands per session — and the raw output is full of repeated noise: stdout, stdout, stdout, fixture setup lines, boilerplate. A tool called RTK (Rust Token Killer) strips and reformats tool inputs/outputs down to what the model actually needs. In the demo, one tool call went from 612 lines (36,700 characters) to 4 lines (177 characters) — a 99% reduction on that call. Not every call is that bloated, but across a real session you can expect 30–50% savings with zero quality loss.
2. Semantically compress your CLAUDE.md
Most system prompts and memory files are padded with pleasantries that carry zero information. "Hello, thank you so much for helping out with this project, we really appreciate..." compresses to "Project instructions and guidelines." Same meaning, fraction of the tokens. The demo took a CLAUDE.md from 865 words to 211 — roughly 1,125 tokens down to 274. Run this pass on every project's system prompt and memory files. Bonus: tighter instructions often produce better output, not just cheaper output.
3. Never let Claude read logs raw
If Claude needs to dig through a 5,000-line log file, don't let it read the file. Load the log into SQLite and give Claude a small query script instead. It finds the root cause — with exact line numbers — by running a query rather than eating thousands of lines of text. The same principle applies to CSVs and Google Sheets: query, don't read.
4. Block huge reads generally
Some files are just too big to read start to finish, and most never needed to be. A good pattern: have Claude sample the beginning and end of a large file, infer the structure, then use sed, grep, or a search function to jump straight to the relevant region. Instead of 20,000 lines, it reads 20 or 30. On large-resource tool calls, that's ~99% saved.
5. Prompt in English
English is unusually information-dense per token. The same query cost 51 tokens in English, 87 in Italian, 118 in German, 74 in Japanese. (Mandarin is a notable exception — symbolic characters are token-efficient.) Most models are also trained predominantly on English, so you're not trading quality away. If you work in another language, this alone can be a 20–80% swing.
6. Add context frugality rules
Bake frugality directly into your CLAUDE.md: read only files directly relevant to the task, ask before expanding beyond three files, prefer glob/grep to locate then read the region, never read generated files or fixtures unless asked. One caveat — this is the only tactic on the list that can hurt quality, because you're constraining where the model looks. The trade: be more specific in your prompts, pay less.
7. Audit your context regularly
Run /context periodically. Things creep in silently — the creator discovered he was running a dozen Chrome MCP instances simultaneously, each loaded with full context, bloating every request and confusing the model. System prompt, tools, memory files, and skills can quietly eat 8%+ of your window before you've typed a word. A neat trick: set up a looping watcher instance that snapshots your context daily and flags anything new.
8. Cap your thinking
Adaptive thinking tends to overspend. In a head-to-head bug-hunt test, low thinking found the bug in 7 turns and 1,028 output tokens; extra-high found the same bug in 9 turns and 1,363 tokens — typically closer to 1.5x over repeated runs. With a model as capable as Fable, default to low thinking and opt into big budgets only when a task demonstrably needs them. That's a 30–40% saving on its own.