Skip to content

The real cost of running AI agents in production

Chatbots are cheap. Agents are not.

A chatbot sends a user message, gets a response, displays it. Maybe 2,000 tokens per exchange. An agent reads files, calls tools, retries on errors, re-sends the entire conversation every step, and does this 20–60 times per task. Same API, completely different economics.

If you’re budgeting for AI agents the same way you budget for a chatbot, you’re underestimating by 10–50x.


We measured token consumption across three workload types, each running for one hour:

Coding agent (OpenClaw)
~2.1M tokens
Research agent (CrewAI)
~1.2M tokens
RAG chatbot
~200K tokens
Simple chatbot
~40K tokens

The coding agent consumed 52x more tokens than a simple chatbot in the same time period. And this is normal — the agent was doing useful work the entire time.


Three architectural properties of agents make them expensive:

Every agent step appends tool outputs to the conversation. The LLM re-processes the entire conversation on each step. If the agent reads a 3,000-token file at step 5, that file gets re-sent at steps 6, 7, 8… all the way to the end.

For a 40-step task, one file read costs: 3,000 tokens × 35 remaining steps = 105,000 tokens in re-transmission.

This is why agent token consumption grows quadratically, not linearly.

Agent frameworks use large system prompts — OpenClaw’s is ~9,600 tokens, CrewAI’s varies by agent configuration. This prompt is sent with every request. Over 40 steps, the system prompt alone costs 384,000 tokens.

When a tool call fails, the agent retries. Each retry sends the full context plus the error message. Three retries on a 30K-token context wastes 90K tokens with no productive output.

Without a retry cap, this can run indefinitely. We covered this in detail in Why your AI agent needs a budget.


Assuming one developer running 15 agent tasks per day, 22 working days per month, ~500K tokens per task:

Model Cost/task Daily (×15) Monthly
Claude Opus 4.6 $9.18 $137.70 $3,029
Claude Sonnet 4.6 $2.25 $33.75 $743
GPT-5.4 $4.73 $70.95 $1,561
DeepSeek V3.2 $0.16 $2.40 $53
Qwen 3.5 35B $0.04 $0.60 $13
CheapestInference Pro $50 flat

A team of 5 developers each running 15 tasks/day on Claude Opus spends $15,145/month. The same team on DeepSeek V3.2 via CheapestInference pays $250/month (5 × $50 Pro plan). That’s a 60x reduction.


Four strategies to cut agent inference costs

Section titled “Four strategies to cut agent inference costs”

DeepSeek V3.2 and Qwen 3.5 score within 4 points of GPT-5.4 and Opus on most benchmarks. For coding tasks specifically, DeepSeek V3.2 matches Opus on HumanEval and SWE-bench. Full data: Open-source models are production-ready.

Not every agent step needs a frontier model. File reads, simple classifications, and formatting don’t need 685B parameters. Use a small model for easy steps and a large model for hard ones. Full guide: Building a multi-model architecture.

3. Set per-key budgets with automatic reset

Section titled “3. Set per-key budgets with automatic reset”

Give each agent its own API key with a dollar-denominated budget that resets every few hours. When the budget is exhausted, the agent pauses instead of burning through your allocation. We built this into every key: Agent budgets explained.

Per-token pricing penalizes the exact patterns agents use: large contexts, many steps, retries. Flat-rate pricing makes all of that free. Your agent can use the full context window, retry freely, and run 24/7 without increasing the bill.


Here’s the equation most teams miss:

Agent cost = tokens_per_step × steps × cost_per_token

Most optimization focuses on cost_per_token — switching to a cheaper model. But tokens_per_step grows with context (quadratic), and steps is unpredictable. Optimizing only one variable leaves the other two working against you.

Flat-rate pricing eliminates all three variables from your bill. The cost is the subscription. Period.


We serve many models with flat-rate pricing and per-key budget caps. One subscription, unlimited keys, and the guarantee that your agent’s token consumption never becomes your problem. Get started or see plans.