Overview

CheapestInference is an AI inference proxy that gives you access to every major model through a single API with flat monthly pricing. No per-token charges.

Base URL

https://api.cheapestinference.com

How it works

CheapestInference routes your requests to the appropriate model provider (OpenAI, Anthropic, Google, Meta, DeepSeek, Qwen, Moonshot).

You subscribe to a plan or top up credits
You create API keys from the dashboard
You use those keys with the OpenAI or Anthropic SDK — just change the base URL
The platform validates your key, enforces rate limits, and forwards the request to the provider

Your API key works exactly like an OpenAI or Anthropic key. All routing, rate limiting, and spend tracking is handled automatically.

Supported endpoints

Endpoint	Description
`POST /v1/chat/completions`	OpenAI-compatible chat (all models)
`POST /anthropic/v1/messages`	Anthropic-compatible messages
`POST /v1/embeddings`	Text embeddings
`GET /v1/models`	List available models

The response format matches the official OpenAI and Anthropic APIs exactly.

Rate limits

Rate limits are enforced per key and reset every minute:

Limit	Standard	Pro
Requests per minute (RPM)	60	200
Tokens per minute (TPM)	3,333	13,333

Each API key has its own independent limits — one key hitting its limit does not affect other keys.

Payment options

Method	How
Card	Visa, Mastercard, etc. via Stripe
USDC	Direct transfer on Base L2 (MetaMask, Coinbase Wallet)
Credits	Top up $5–$50, pay as you go, no subscription required

Subscriptions last 30 days with no auto-renewal. You renew manually when ready.

x402 protocol

Requests without an API key receive a 402 Payment Required response with USDC pricing. AI agents can pay per request using the x402 protocol on Base L2 — no account needed.

SDKs

No custom SDK required. Use the official OpenAI or Anthropic SDK in any language:

Python: openai, anthropic
Node.js: openai, @anthropic-ai/sdk
Any OpenAI-compatible client (Go, Rust, Java, etc.)