MiniMax M3 API — unlimited & flat-rate access

MiniMax M3 is MiniMax’s frontier multimodal coding and agentic model, with a 1M-token context window. CheapestInference serves it through an OpenAI- and Anthropic-compatible API on flat-rate monthly plans and a truly unlimited pool — so your cost does not scale with tokens.

Quick facts


Model	MiniMax M3
Provider	MiniMax (served direct)
Model ID	`MiniMax-M3`
Context window	1M tokens
Cost basis	$0.60 / $2.40 per 1M tokens (in / out)
Endpoints	`/v1/chat/completions` (OpenAI), `/anthropic/v1/messages` (Anthropic)
Pricing	From $39/mo — reserve an 8-hour daily time block, up to full 24/7

Call MiniMax M3

from openai import OpenAI

client = OpenAI(
    base_url="https://api.cheapestinference.com/v1",
    api_key="sk-..."  # your subscriber key
)

response = client.chat.completions.create(
    model="MiniMax-M3",
    messages=[{"role": "user", "content": "Summarize this document..."}],
)

curl https://api.cheapestinference.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "MiniMax-M3", "messages": [{"role": "user", "content": "Hello"}]}'

Why flat-rate MiniMax M3

MiniMax M3 pairs frontier coding and agentic ability with a 1M-token context window, making it well suited to large codebases, long documents, and long-running agent loops. On CheapestInference it is billed at a flat monthly rate, not per token, so heavy long-context workloads have a predictable cost. It has the lowest input cost basis of the three served models and is part of the frontier coding pool alongside Kimi K2.6 and GLM 5.2, with automatic failover. It works in any OpenAI-compatible client.

Common questions

Is there a MiniMax M3 API? Yes. Use model id MiniMax-M3 against https://api.cheapestinference.com/v1. The API is OpenAI- and Anthropic-SDK compatible.

How much does MiniMax M3 cost? From $39/month. You reserve one or more 8-hour daily time blocks (up to full 24/7) and use MiniMax M3 with no usage cap — billed at a flat monthly fee, not per token.

What is the MiniMax M3 context window? 1M tokens, so it handles large codebases, long documents, and extended agent conversations in a single request.

Is MiniMax M3 good for coding? Yes — it is a frontier coding and agentic model, and is served alongside Kimi K2.6 and GLM 5.2 in the frontier coding pool with automatic failover.

MiniMax M3 API — unlimited & flat-rate access

Quick facts

Call MiniMax M3

Why flat-rate MiniMax M3

Common questions

Related