Qwen 3.5 vs GPT-5.4 vs Claude Opus 4.6 — same quality, fraction of the price

Mar 26, 2026

You asked for this. After our first benchmark post, the most requested model was Qwen 3.5. Here it is — 4 models across 5 metrics, same models in every chart:

Open-source: Qwen3.5-397B-A17B (flagship), Qwen3.5-35B-A3B (efficient) Proprietary: GPT-5.4, Claude Opus 4.6

Knowledge: MMLU-Pro (%)

GPT-5.4

88.5%

Qwen3.5 397B

87.8%

Qwen3.5 35B

85.3%

Claude Opus 4.6

82.0%

GPT-5.4 leads at 88.5%, but Qwen3.5-397B is 0.7 points behind — statistically noise. The 35B with only 3B active parameters scores 85.3%, beating Opus by 3 points. The total spread across all four models is just 6.5 points.

Qwen3.5-397B matches GPT-5.4 at 5x less cost. The 35B beats Opus at 23x less.

Reasoning: GPQA Diamond (%)

GPT-5.4

92.0%

Claude Opus 4.6

91.3%

Qwen3.5 397B

88.4%

Qwen3.5 35B

84.2%

Proprietary models lead on graduate-level reasoning. GPT-5.4 at 92% and Opus at 91.3% are strong. But Qwen3.5-397B at 88.4% is within 4 points — and costs $0.54/M vs $2.50 and $5.00. The 35B at 84.2% is still PhD-level performance for $0.22/M input.

Code: LiveCodeBench v6 (%)

GPT-5.4

84.0%

Qwen3.5 397B

83.6%

Claude Opus 4.6

76.0%

Qwen3.5 35B

74.6%

The 397B essentially ties GPT-5.4 on competitive coding — 0.4 points apart. Both beat Opus by 8+ points. The 35B at 74.6% is within 2 points of Opus, at 1/23rd the price.

For dedicated coding workloads, we also serve Qwen3-Coder-480B (SWE-bench Verified: 69.6%, comparable to Claude Sonnet 4).

Speed: output tokens per second

Qwen3.5 35B

178 t/s

Qwen3.5 397B

84 t/s

GPT-5.4

~78 t/s

Claude Opus 4.6

46 t/s

The 35B’s MoE architecture pays off — 178 tok/s is 2.3x faster than GPT-5.4 and 3.9x faster than Opus. Even the 397B flagship at 84 tok/s outpaces both proprietary models. This is what happens when only 3-17B parameters activate per token instead of the full model.

Speed data from Artificial Analysis. Actual speeds on our infrastructure may differ.

Price: input cost per million tokens

Qwen3.5 35B

$0.22

Qwen3.5 397B

$0.54

GPT-5.4

$2.50

Claude Opus 4.6

$5.00

This is the chart that matters. Opus costs 23x more than the 35B and 9x more than the 397B. GPT-5.4 costs 5x more than the 397B. The quality difference? Single-digit percentage points on every benchmark.

The full picture

Quality only — no price axis. GPT-5.4 (gray) has the largest shape. Opus (dashed) is strong on reasoning and code. The 397B (indigo) nearly overlaps GPT-5.4 on code and knowledge. The 35B (teal) pulls hard left on speed — 178 tok/s is 2.3x faster than anything else here. Price tells its own story in the chart above.

The scorecard

Metric	Winner	Qwen3.5 397B	GPT-5.4	Claude Opus 4.6	Gap (397B vs best)
Knowledge (MMLU-Pro)	GPT-5.4	87.8%	88.5%	82.0%	-0.7 pts
Reasoning (GPQA)	GPT-5.4	88.4%	92.0%	91.3%	-3.6 pts
Code (LiveCodeBench)	GPT-5.4	83.6%	84.0%	76.0%	-0.4 pts
Speed (tok/s)	Qwen3.5 397B	84 t/s	~78 t/s	46 t/s	1.1x faster
Price ($/M input)	Qwen3.5 397B	$0.54	$2.50	$5.00	4.6x cheaper

Same weight class, different price tag. The 397B trades 0.4–3.6 points on quality for 4.6x lower price and faster speed. It beats Opus on 4 out of 5 metrics outright.

Note: The Qwen3.5-35B-A3B ($0.22/M) scores 85.3% MMLU-Pro, 84.2% GPQA, 74.6% LiveCodeBench at 178 tok/s — beating Opus on knowledge and speed at 23x less cost. A different weight class, but worth considering if speed and price matter more than the last few quality points.

The real question: what are you paying for?

The quality gap between Qwen3.5-397B and GPT-5.4 is 0.7 points on knowledge, 0.4 points on code. The price gap is 4.6x.

Put it differently:

Model	MMLU-Pro	Cost per quality point
Qwen3.5 35B	85.3%	$0.003 per point per M tokens
Qwen3.5 397B	87.8%	$0.006 per point per M tokens
GPT-5.4	88.5%	$0.028 per point per M tokens
Claude Opus 4.6	82.0%	$0.061 per point per M tokens

Opus costs 20x more per quality point than the 35B — and scores lower. GPT-5.4 leads on quality but costs 5-10x more for single-digit advantages.

For most workloads, the last 3% of benchmark performance isn’t worth a 5x price increase. And for workloads where it is — the 397B gets you within 1 point of GPT-5.4 at a fraction of the cost.

Also available: specialized Qwen models

Beyond the general-purpose models, we serve two Qwen specialists:

Qwen3-Coder-480B — SWE-bench Verified 69.6%, comparable to Claude Sonnet 4. Built for agentic coding.
Qwen3-235B-Thinking — Chain-of-thought reasoning specialist. When you need the model to show its work.

Both available through the same API, same flat-rate plans.

All Qwen 3.5 models are available now on our API. Flat rate from $20/mo, or pay-as-you-go credits. See pricing and try it →

Sources: Qwen3.5-397B Model Card · Qwen3.5-35B Model Card · Artificial Analysis Leaderboard · GPQA Diamond Leaderboard · OpenAI Pricing · Anthropic Pricing · LiveCodeBench Leaderboard