The real cost of running AI agents in production

Apr 15, 2026

Chatbots are cheap. Agents are not.

A chatbot sends a user message, gets a response, displays it. Maybe 2,000 tokens per exchange. An agent reads files, calls tools, retries on errors, re-sends the entire conversation every step, and does this 20–60 times per task. Same API, completely different economics.

If you’re budgeting for AI agents the same way you budget for a chatbot, you’re underestimating by 10–50x.

Token consumption: chatbot vs. agent

We measured token consumption across three workload types, each running for one hour:

Coding agent (OpenClaw)

~2.1M tokens

Research agent (CrewAI)

~1.2M tokens

RAG chatbot

~200K tokens

Simple chatbot

~40K tokens

The coding agent consumed 52x more tokens than a simple chatbot in the same time period. And this is normal — the agent was doing useful work the entire time.

Why agents cost so much

Three architectural properties of agents make them expensive:

1. Context accumulation

Every agent step appends tool outputs to the conversation. The LLM re-processes the entire conversation on each step. If the agent reads a 3,000-token file at step 5, that file gets re-sent at steps 6, 7, 8… all the way to the end.

For a 40-step task, one file read costs: 3,000 tokens × 35 remaining steps = 105,000 tokens in re-transmission.

This is why agent token consumption grows quadratically, not linearly.

2. System prompt overhead

Agent frameworks use large system prompts — OpenClaw’s is ~9,600 tokens, CrewAI’s varies by agent configuration. This prompt is sent with every request. Over 40 steps, the system prompt alone costs 384,000 tokens.

3. Error retry loops

When a tool call fails, the agent retries. Each retry sends the full context plus the error message. Three retries on a 30K-token context wastes 90K tokens with no productive output.

Without a retry cap, this can run indefinitely. We covered this in detail in Why your AI agent needs a budget.

Monthly cost by model and framework

Assuming one developer running 15 agent tasks per day, 22 working days per month, ~500K tokens per task:

Model	Cost/task	Daily (×15)	Monthly
Claude Opus 4.6	$9.18	$137.70	$3,029
Claude Sonnet 4.6	$2.25	$33.75	$743
GPT-5.4	$4.73	$70.95	$1,561
DeepSeek V3.2	$0.16	$2.40	$53
Qwen 3.5 35B	$0.04	$0.60	$13
CheapestInference Pro	—	—	$50 flat

A team of 5 developers each running 15 tasks/day on Claude Opus spends $15,145/month. The same team on DeepSeek V3.2 via CheapestInference pays $250/month (5 × $50 Pro plan). That’s a 60x reduction.

Four strategies to cut agent inference costs

1. Switch to open-source models

DeepSeek V3.2 and Qwen 3.5 score within 4 points of GPT-5.4 and Opus on most benchmarks. For coding tasks specifically, DeepSeek V3.2 matches Opus on HumanEval and SWE-bench. Full data: Open-source models are production-ready.

2. Route by task complexity

Not every agent step needs a frontier model. File reads, simple classifications, and formatting don’t need 685B parameters. Use a small model for easy steps and a large model for hard ones. Full guide: Building a multi-model architecture.

3. Set per-key budgets with automatic reset

Give each agent its own API key with a dollar-denominated budget that resets every few hours. When the budget is exhausted, the agent pauses instead of burning through your allocation. We built this into every key: Agent budgets explained.

4. Use flat-rate pricing

Per-token pricing penalizes the exact patterns agents use: large contexts, many steps, retries. Flat-rate pricing makes all of that free. Your agent can use the full context window, retry freely, and run 24/7 without increasing the bill.

The math that matters

Here’s the equation most teams miss:

Agent cost = tokens_per_step × steps × cost_per_token

Most optimization focuses on cost_per_token — switching to a cheaper model. But tokens_per_step grows with context (quadratic), and steps is unpredictable. Optimizing only one variable leaves the other two working against you.

Flat-rate pricing eliminates all three variables from your bill. The cost is the subscription. Period.

We serve many models with flat-rate pricing and per-key budget caps. One subscription, unlimited keys, and the guarantee that your agent’s token consumption never becomes your problem. Get started or see plans.