Skip to content

Models

All models are available on every plan. Rate limits (RPM and TPM) are set at the key level, not per model.

Query the live model list:

Terminal window
curl https://api.cheapestinference.com/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"

Each model object includes an id, owned_by, and a type field ("chat" or "embedding") so you can filter programmatically:

{
"id": "deepseek-ai/DeepSeek-V3.2",
"object": "model",
"created": 1677610602,
"owned_by": "cheapestinference",
"type": "chat"
}

To get details about a specific model:

Terminal window
curl https://api.cheapestinference.com/v1/models/deepseek-chat \
-H "Authorization: Bearer YOUR_API_KEY"
Model IDType
Qwen/Qwen3.5-397B-A17Bchat
Qwen/Qwen3.5-122B-A10Bchat
Qwen/Qwen3.5-35B-A3Bchat
Qwen/Qwen3-235B-A22B-Instruct-2507chat
Qwen/Qwen3-Next-80B-A3B-Instructchat
Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbochat
Qwen/Qwen3-Coder-480B-A35B-Instructchat
Qwen/Qwen3-235B-A22B-Thinking-2507chat
Qwen/Qwen3-VL-235B-A22B-Instructchat
Qwen/Qwen3-VL-30B-A3B-Instructchat
Model IDType
deepseek-ai/DeepSeek-V3.2chat
deepseek-ai/DeepSeek-R1-0528chat
deepseek-ai/DeepSeek-R1-0528-Turbochat
deepseek-ai/DeepSeek-R1-Distill-Llama-70Bchat
deepseek-ai/DeepSeek-OCRchat
Model IDType
meta-llama/Llama-4-Scout-17B-16E-Instructchat
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8chat
meta-llama/Llama-Guard-4-12Bchat
Model IDType
moonshotai/Kimi-K2.5chat
moonshotai/Kimi-K2.5-Turbochat
moonshotai/Kimi-K2-Thinkingchat
Model IDType
MiniMaxAI/MiniMax-M2.5chat
Model IDType
zai-org/GLM-5chat
zai-org/GLM-4.7-Flashchat
Model IDType
Qwen/Qwen3-Embedding-8Bembedding
Qwen/Qwen3-Embedding-0.6B-batchembedding
BAAI/bge-m3embedding
intfloat/multilingual-e5-large-instructembedding
nvidia/llama-nemotron-embed-vl-1b-v2embedding
google/embeddinggemma-300membedding

Specify the model ID in your request:

# OpenAI SDK — any model
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3.2", # or "Qwen/Qwen3.5-397B-A17B", etc.
messages=[{"role": "user", "content": "Hello"}]
)

All models work through the OpenAI endpoint (/v1/chat/completions) and the Anthropic-compatible endpoint (/anthropic/v1/messages). The API handles format translation automatically.