Skip to content

Models

All models are available on every plan. Rate limits (RPM and TPM) are set at the key level, not per model.

Query the live model list:

Terminal window
curl https://api.cheapestinference.com/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"

This returns the full list of available models. The response follows the OpenAI /v1/models format.

To get details about a specific model:

Terminal window
curl https://api.cheapestinference.com/v1/models/gpt-4o-mini \
-H "Authorization: Bearer YOUR_API_KEY"
Model IDName
gpt-4oGPT-4o
gpt-4o-miniGPT-4o mini
o3-minio3 mini
Model IDName
claude-sonnet-4-20250514Claude Sonnet 4
claude-3-5-haiku-20241022Claude 3.5 Haiku
Model IDName
gemini-2.5-flashGemini 2.5 Flash
gemini-2.5-proGemini 2.5 Pro
Model IDName
deepseek-chatDeepSeek V3.2
deepseek-reasonerDeepSeek R1
Model IDName
qwen3-235bQwen3 235B
qwen3-coder-480bQwen3 Coder 480B
Model IDName
llama-3.3-70bLlama 3.3 70B
llama-4-scoutLlama 4 Scout
Model IDName
kimi-2.5Kimi 2.5
Model IDName
text-embedding-3-smallEmbedding 3 Small
text-embedding-3-largeEmbedding 3 Large

Specify the model ID in your request:

# OpenAI SDK — any model
response = client.chat.completions.create(
model="deepseek-chat", # or "gpt-4o", "kimi-2.5", etc.
messages=[{"role": "user", "content": "Hello"}]
)
# Anthropic SDK — Claude models
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)

All models work through both the OpenAI endpoint (/v1/chat/completions) and the Anthropic endpoint (/anthropic/v1/messages). The API handles format translation automatically.