Skip to content

Overview

CheapestInference is an AI inference proxy that gives you access to every major model through a single API with flat monthly pricing. No per-token charges.

https://api.cheapestinference.com

CheapestInference routes your requests to the appropriate model provider (OpenAI, Anthropic, Google, Meta, DeepSeek, Qwen, Moonshot).

  1. You subscribe to a plan or top up credits
  2. You create API keys from the dashboard
  3. You use those keys with the OpenAI or Anthropic SDK — just change the base URL
  4. The platform validates your key, enforces rate limits, and forwards the request to the provider

Your API key works exactly like an OpenAI or Anthropic key. All routing, rate limiting, and spend tracking is handled automatically.

EndpointDescription
POST /v1/chat/completionsOpenAI-compatible chat (all models)
POST /anthropic/v1/messagesAnthropic-compatible messages
POST /v1/embeddingsText embeddings
GET /v1/modelsList available models

The response format matches the official OpenAI and Anthropic APIs exactly.

Rate limits are enforced per key and reset every minute:

LimitStandardPro
Requests per minute (RPM)60200
Tokens per minute (TPM)3,33313,333

Each API key has its own independent limits — one key hitting its limit does not affect other keys.

MethodHow
CardVisa, Mastercard, etc. via Stripe
USDCDirect transfer on Base L2 (MetaMask, Coinbase Wallet)
CreditsTop up $5–$50, pay as you go, no subscription required

Subscriptions last 30 days with no auto-renewal. You renew manually when ready.

Requests without an API key receive a 402 Payment Required response with USDC pricing. AI agents can pay per request using the x402 protocol on Base L2 — no account needed.

No custom SDK required. Use the official OpenAI or Anthropic SDK in any language:

  • Python: openai, anthropic
  • Node.js: openai, @anthropic-ai/sdk
  • Any OpenAI-compatible client (Go, Rust, Java, etc.)