Your OpenClaw Bill Is Higher Than It Should Be
You asked a simple question. OpenClaw answered it. And somewhere in the background, Claude's context window silently consumed thousands of tokens to do it — costing you real money, every time you hit send.
Here's something most AI tool vendors don't advertise: a large portion of your token spend isn't coming from your questions or the answers. It's coming from setup overhead — the invisible configuration loaded into the prompt automatically before Claude ever reads a word you've written.
Let's change that.
The Hidden Cost You're Not Watching
Every time OpenClaw runs — every single message — it builds a system prompt from scratch. That means loading a stack of workspace and startup files: AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, BOOTSTRAP.md, MEMORY.md.
All of them. Every time. Regardless of what you're asking.
Depending on your configuration, this creates a baseline overhead of 3,000 to 14,000 tokens per call before you've typed a single character — your total will vary based on which startup files are enabled and how large they are. At current Claude Opus pricing (check anthropic.com/pricing for the latest rates), that overhead can add real cost per message, just in configuration setup.
Do the math on 100 messages per day and you've got a significant line item hiding in your API bill. For files you didn't ask to load.
This is an intentional design built around Claude's large context window, not an oversight. The community has flagged it, and as of this writing the behavior remains unchanged — check the OpenClaw issue tracker for current status. Which means if you want to fix it, you have to fix it yourself.
The Fix: Specialized Agents for Each Task
The core idea is simple. Instead of one large agent that loads everything for every request, you build smaller, focused agents that only carry what they actually need.
Think of it like packing for a trip. Right now, OpenClaw is sending a moving truck every time you need a toothbrush. The fix is to pack a carry-on.
Step 1: Give Each Agent Its Own Workspace
In OpenClaw's multi-agent setup, each agent has its own workspace directory. Per-agent tools and instructions live in
Use that separation intentionally:
~/.openclaw/agents/email-agent/workspace/ — AGENTS.md (minimal, email-only), SOUL.md (stripped down), MEMORY.md (email-specific only) ~/.openclaw/agents/code-agent/workspace/ — AGENTS.md (code-relevant only), minimal startup overhead ~/.openclaw/agents/research-agent/workspace/ — AGENTS.md, targeted context
Each agent loads only its own .md files — targeted context instead of everything at once.
Step 2: Cut Your Startup Files Down
Your AGENTS.md is probably doing the work of five documents. It doesn't have to.
Aim to cut startup files to roughly 800 tokens or fewer — the exact threshold depends on your use case, but leaner is almost always better. Remove sections for features your agents don't actually use. At 100 Opus calls per day, eliminating 1,000 tokens from startup files can save meaningfully on your monthly bill — the exact amount will depend on current Opus pricing and your actual usage mix.
Go through every .md file and ask: does every agent need this, every time? If the answer is no — cut it.
Step 3: Move Instructions Into Skills, Not .md Files
This is the highest-leverage change you can make.
.md files load into every call automatically. Skills only load when they're triggered. That difference matters.
Move your personality instructions, workflow logic, and task-specific context into skills. They sit idle until needed. Your startup files shrink. Your token costs follow.
You can even ask your OpenClaw instance to audit itself — have it analyze the startup files and reorganize bloated sections into skills. It's exactly the kind of task these tools are built for.
Step 4: Use Sub-Agents for Heavy Tasks
When you're running something intensive — a research pass, a bulk document review, a long-form generation task — don't run it in your main agent's context.
Spawn a sub-agent instead. In current versions, sub-agents typically load only AGENTS.md and TOOLS.md, bypassing the full conversation history. You can also configure them to use lower-cost models for routine work — check the OpenClaw docs for your version to confirm current behavior. Results report back to your main chat when they're done.
You get cleaner isolation, lower per-message cost, and a system that's easier to maintain.
Step 5: Trim Your Tool List
Tool definitions aren't free. Every tool you've defined is sent as part of the prompt on every call — whether the agent uses it or not.
If your agent has 20 tools defined but regularly uses 3, those 17 unused definitions add dead weight to every call. Give each agent only the tools it actually needs.
Step 6: Use Search Instead of Loading Entire Files
If your agents need to reference large documents or knowledge bases, stop loading entire files into the context. Use a local search tool — which builds a searchable index of your content.
Instead of loading the whole file, the agent retrieves only the specific sections it needs. In internal testing, this approach reduced token usage from roughly 15,000 to 1,500 per lookup — about a 90% reduction. Results will vary based on content size, query specificity, and how retrieval is configured.
That's not a minor optimization. That's a structural fix.
The Bottom Line
Do these things and you're not just cutting costs — you're building a leaner, faster, more focused AI setup. One where each agent knows its job, carries only what it needs, and doesn't drag thousands of tokens of setup overhead into a simple question.
That's what a well-optimized AI stack actually looks like.
Unchained AI Solutions builds custom AI systems, workflow automation, and end-to-end marketing for businesses that are done settling. If your AI stack is costing more than it should — or doing less than it could — let's talk.
Written by Shay Owensby
Founder of Unchained AI Solutions. Building AI-powered systems that deliver real business results.