OpenClaw is free. Running it is not.
OpenClaw has 247,000 GitHub stars. It’s free, open-source, and runs locally. You install it, point it at an LLM, and it writes code, browses the web, queries databases, and executes files on your behalf.
The agent is free. The inference is not.
Every time OpenClaw calls a model, it re-sends the entire conversation history — every tool output, every file it read, every intermediate result. By iteration 20 of a typical task, the input context is 30,000+ tokens. By iteration 40, it’s past 100,000. And it sends this every single request.
This is not a bug. It’s how agents work. And it’s why running OpenClaw on pay-per-token APIs costs $300–600/month for active users — sometimes more.
Where the tokens go
Section titled “Where the tokens go”We broke down token consumption for a typical OpenClaw coding task: “add authentication to an Express API.” The agent completed it in 38 tool calls.
Total: ~525,000 tokens for a single task. The agent’s actual output — the code it wrote — was 19K tokens. The other 96% is overhead.
On Claude Opus at $15/M input + $75/M output, that single task costs $9.18. Run five tasks a day and you’re at $1,377/month.
On DeepSeek V3.2 via a pay-per-token provider at $0.27/M input + $1.10/M output, the same task costs $0.16. Better — but 20 tasks a day is still $96/month, and that’s one agent.
The three cost traps
Section titled “The three cost traps”We covered these in depth in Why your AI agent needs a budget, but here’s the OpenClaw-specific version:
1. Context grows quadratically
Section titled “1. Context grows quadratically”OpenClaw reads files into context. If it reads a 2,000-token file at step 5, that file gets re-sent at steps 6, 7, 8… all the way to 38. That single file read costs 2,000 × 33 remaining steps = 66,000 tokens in re-transmission alone.
Users report session contexts at 56–58% of the 400K context window during normal use. This isn’t a failure mode — it’s the architecture working as designed.
2. System prompt is a fixed tax
Section titled “2. System prompt is a fixed tax”OpenClaw’s system prompt is ~9,600 tokens. It gets sent with every request. Over 38 tool calls, that’s 365K tokens just in system prompts. You pay this whether the agent does useful work or not.
3. Wrong model for the job
Section titled “3. Wrong model for the job”OpenClaw defaults to a single model for everything. But not every tool call needs the same intelligence:
- Reading a file and deciding what to edit? Llama 3.1 8B handles this at 200 tokens/sec.
- Writing complex authentication logic? DeepSeek V3.2 or Kimi K2.5 is the right call.
- Formatting a config file? Any 8B model is overkill but still cheaper than Opus.
We wrote a full guide on this pattern: Building a multi-model architecture. Routing agent requests to the right model can cut costs by 60–80% without reducing output quality.
The math on flat-rate vs. pay-per-token
Section titled “The math on flat-rate vs. pay-per-token”Here’s the comparison for an OpenClaw user running ~20 tasks/day:
| Provider | Cost/task | 20 tasks/day | Monthly |
|---|---|---|---|
| Claude Opus (direct) | $9.18 | $183.60 | $5,508 |
| GPT-5.4 (direct) | $4.73 | $94.60 | $2,838 |
| DeepSeek V3.2 (per-token) | $0.16 | $3.20 | $96 |
| CheapestInference Pro | — | — | $50/mo flat |
Flat-rate means you don’t care about context accumulation. The 280K tokens of context overhead that makes pay-per-token expensive? Irrelevant. The system prompt tax? Doesn’t matter. Your agent can call models 24/7 and the bill is the same.
What we’d actually recommend
Section titled “What we’d actually recommend”If you’re running OpenClaw, here’s the setup we see working best:
1. Use open-source models. DeepSeek V3.2 and Kimi K2.5 score within 4 points of proprietary models on coding benchmarks (the data). The gap doesn’t justify a 50x cost difference.
2. Route by complexity. Don’t send file reads and simple decisions to the same model as complex code generation. A router model costs fractions of a cent per classification. Full guide: Multi-model architecture.
3. Set per-key budgets. One API key per agent, each with a dollar-denominated budget that resets every few hours. When the budget runs out, the agent pauses instead of burning through your allocation. We built this into every key: Agent budgets explained.
4. Handle rate limits automatically. Budget caps mean your agent will hit 429s. That’s the point — the cap is working. But OpenClaw kills the conversation when it gets a 429. The agent stops, and if you close the dashboard, that conversation is gone.
We built an OpenClaw plugin that fixes this: openclaw-ratelimit-retry. It hooks into agent_end, detects retriable 429s, parks the session on disk, and waits for the budget window to reset. Then it sends chat.send to the original session — resuming the conversation with its full transcript, as if you had typed a message.
openclaw plugins install @cheapestinference/openclaw-ratelimit-retryplugins: ratelimit-retry: budgetWindowHours: 5 # matches your CheapestInference budget reset maxRetryAttempts: 3 # give up after 3 consecutive 429s checkIntervalMinutes: 5 # check every 5 min for ready retriesThe plugin is zero-dependency, persists across server restarts, deduplicates by session, and handles edge cases like sub-agents, queue overflow, and corrupted state files. If the retry itself hits a 429, it re-queues automatically. No tokens wasted on re-sending from scratch — the agent picks up exactly where it left off.
This turns budget caps from “your agent crashes” into “your agent naps and wakes up.” Set it up once and forget about it.
5. Consider flat-rate. If your agent runs more than a few tasks per day, per-token pricing works against you. Every token of context overhead is money. On flat-rate, context overhead is free — use the full 128K window, re-send everything, let the agent work without constraint.
The irony
Section titled “The irony”OpenClaw is free because the code runs on your machine. But the valuable part — the intelligence — runs on someone else’s GPUs. The agent framework is the cheap part. Inference is the expensive part.
Open-source models on flat-rate infrastructure flip this equation. The models are free. The inference is flat. The only variable cost left is your time.
Point your OpenClaw base_url at https://api.cheapestinference.com/v1 and find out what unconstrained agents actually cost: nothing more than you already budgeted.