June 4, 2026·Vibe Coding·3 min read·Developers using Claude Code

8 ways to cut your token usage in Claude Code

Claude Code bills by the token, and almost all of that cost is context you keep re-sending. Eight habits that shrink the bill without slowing you down.

Claude Code is fast and capable, but every turn re-sends your whole conversation to the model — and you pay for those tokens each time. The single biggest lever on cost isn't the prompt you type; it's how much context you drag along behind it. Here are eight habits I've settled on that cut token usage without making me any slower.

Eight ways to improve and reduce token usage in Claude Code

1. Clear context between unrelated tasks

A long session accumulates files, command output, and back-and-forth that the model re-reads on every turn. When I finish one task and start something unrelated, I run /clear. It resets the conversation so I'm not paying to carry a debugging session into a docs edit. This is the highest-impact habit on the list.

2. Compact instead of scrolling forever

When a single task genuinely needs a long history, use /compact rather than letting the window grow unbounded. It summarises the conversation so far into a compact form, keeping the important decisions while dropping the raw transcript. You keep continuity at a fraction of the token weight.

3. Put conventions in CLAUDE.md once

If you find yourself re-explaining your stack, naming conventions, or test commands every session, move that into a CLAUDE.md file. It's loaded automatically, so the model knows your rules without you spending tokens repeating them — and the guidance stays consistent across sessions.

4. Protect your prompt cache

Claude Code uses prompt caching: stable context at the start of your session is cached and billed at a steep discount on later turns. The catch is that editing early context — your system prompt or CLAUDE.md — mid-session invalidates the cache and forces a full re-read. Set your project context up front, then leave the early turns alone so cache hits do the work.

5. Be specific with file references

Vague asks like "fix the auth bug" push Claude to read broadly to find its bearings. Point it straight at the file and symbol — name the path, the function, the line. Precise references mean fewer exploratory file reads, and file reads are often the largest single chunk of tokens in a session.

6. Use subagents for wide searches

When you need to sweep many files to answer one question, delegate it to a subagent (the Task tool). The subagent does the reading in its own context and returns just the conclusion. The file dumps never land in your main thread, so your primary context — the expensive one you re-send every turn — stays lean.

7. Match the model to the task

You don't need the most powerful model for every step. Routine edits, renames, and simple lookups run perfectly well on a lighter, cheaper model; save the heavyweight reasoning model for genuinely hard architecture and debugging work. Switching models by task is one of the easiest wins available.

8. Trim unused MCP servers

Every connected MCP server injects its tool definitions into your context on every request. Three or four chatty servers you never use can quietly add thousands of input tokens per turn. Disconnect the ones you aren't actively using and reconnect them when you need them.

The pattern underneath all of this

Notice the through-line: nearly every tip is about controlling context, not writing cleverer prompts. Token cost in an agentic tool scales with how much state you carry, not how much you type. Clear aggressively, compact deliberately, keep your stable context stable, and push wide reads out to subagents. Do that and the bill drops sharply while your workflow gets, if anything, faster.

share:X LinkedIn