Overview
CheapestInference is an AI inference proxy that gives you access to open-source models through a single API with flat monthly pricing. No per-token charges.
Base URL
Section titled “Base URL”https://api.cheapestinference.com/v1How it works
Section titled “How it works”CheapestInference routes your requests to the appropriate open-source model provider (Moonshot, Zhipu / Z.ai, MiniMax).
- You subscribe to a pool by reserving one or more daily time blocks
- You create API keys from the dashboard
- You use those keys with the OpenAI or Anthropic SDK — just change the base URL
- The platform validates your key and forwards the request to the provider
Your API key works exactly like an OpenAI or Anthropic key. All routing and spend tracking is handled automatically.
Supported endpoints
Section titled “Supported endpoints”| Endpoint | Description |
|---|---|
POST /v1/chat/completions | OpenAI-compatible chat (all models) |
POST /v1/completions | OpenAI-compatible legacy completions |
POST /anthropic/v1/messages | Anthropic-compatible messages |
GET /v1/models | List available models |
GET /v1/models/:model_id | Get specific model details |
GET /v1/usage | Check key usage and status |
The response format matches the official OpenAI and Anthropic APIs exactly.
Concurrency
Section titled “Concurrency”Each key handles a limited number of simultaneous requests during your reserved blocks. To run more in parallel — or to isolate clients — create additional keys (one per seat). Keys are independent, so one busy key never affects the others.
Payment options
Section titled “Payment options”| Method | How |
|---|---|
| Card | Visa, Mastercard, etc. via Stripe |
| USDC | Direct transfer on Base L2 (MetaMask, Coinbase Wallet) |
| Credits | Pay-as-you-go top-ups — temporarily unavailable |
Subscriptions last 30 days with no auto-renewal. You renew manually when ready.
x402 protocol
Section titled “x402 protocol”Requests without an API key receive a 402 Payment Required response with a product catalog. AI agents can subscribe or purchase credits autonomously using the x402 protocol with USDC on Base L2 — no human setup needed.
No custom SDK required. Use the official OpenAI or Anthropic SDK in any language:
- Python:
openai,anthropic - Node.js:
openai,@anthropic-ai/sdk - Any OpenAI-compatible client (Go, Rust, Java, etc.)