MiniMax M3 API — unlimited & flat-rate access
MiniMax M3 is MiniMax’s frontier multimodal coding and agentic model, with a 1M-token context window. CheapestInference serves it through an OpenAI- and Anthropic-compatible API on flat-rate monthly plans and a truly unlimited pool — so your cost does not scale with tokens.
Quick facts
Section titled “Quick facts”| Model | MiniMax M3 |
| Provider | MiniMax (served direct) |
| Model ID | MiniMax-M3 |
| Context window | 1M tokens |
| Cost basis | $0.60 / $2.40 per 1M tokens (in / out) |
| Endpoints | /v1/chat/completions (OpenAI), /anthropic/v1/messages (Anthropic) |
| Pricing | From $39/mo — reserve an 8-hour daily time block, up to full 24/7 |
Call MiniMax M3
Section titled “Call MiniMax M3”from openai import OpenAI
client = OpenAI( base_url="https://api.cheapestinference.com/v1", api_key="sk-..." # your subscriber key)
response = client.chat.completions.create( model="MiniMax-M3", messages=[{"role": "user", "content": "Summarize this document..."}],)curl https://api.cheapestinference.com/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model": "MiniMax-M3", "messages": [{"role": "user", "content": "Hello"}]}'Why flat-rate MiniMax M3
Section titled “Why flat-rate MiniMax M3”MiniMax M3 pairs frontier coding and agentic ability with a 1M-token context window, making it well suited to large codebases, long documents, and long-running agent loops. On CheapestInference it is billed at a flat monthly rate, not per token, so heavy long-context workloads have a predictable cost. It has the lowest input cost basis of the three served models and is part of the frontier coding pool alongside Kimi K2.6 and GLM 5.2, with automatic failover. It works in any OpenAI-compatible client.
Common questions
Section titled “Common questions”Is there a MiniMax M3 API?
Yes. Use model id MiniMax-M3 against https://api.cheapestinference.com/v1. The API is OpenAI- and Anthropic-SDK compatible.
How much does MiniMax M3 cost? From $39/month. You reserve one or more 8-hour daily time blocks (up to full 24/7) and use MiniMax M3 with no usage cap — billed at a flat monthly fee, not per token.
What is the MiniMax M3 context window? 1M tokens, so it handles large codebases, long documents, and extended agent conversations in a single request.
Is MiniMax M3 good for coding? Yes — it is a frontier coding and agentic model, and is served alongside Kimi K2.6 and GLM 5.2 in the frontier coding pool with automatic failover.