Skip to content

Models

All models are available on every plan. Rate limits (RPM and TPM) are set at the key level, not per model.

Query the live model list:

Terminal window
curl https://api.cheapestinference.com/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"

This returns the full list of available models. The response follows the OpenAI /v1/models format.

To get details about a specific model:

Terminal window
curl https://api.cheapestinference.com/v1/models/deepseek-chat \
-H "Authorization: Bearer YOUR_API_KEY"
Model IDName
llama-3.3-70bLlama 3.3 70B
llama-3.1-8bLlama 3.1 8B
llama-3.2-3bLlama 3.2 3B
Model IDName
deepseek-chatDeepSeek V3.2
deepseek-reasonerDeepSeek R1
Model IDName
qwen3-235bQwen3 235B
qwen3-30bQwen3 30B
qwen3-coderQwen3 Coder
Model IDName
gemma-3-27bGemma 3 27B
gemma-3-12bGemma 3 12B
gemma-3-4bGemma 3 4B
Model IDName
kimi-2.5Kimi 2.5
Model IDName
BAAI/bge-large-en-v1.5BGE Large
BAAI/bge-base-en-v1.5BGE Base

Specify the model ID in your request:

# OpenAI SDK — any model
response = client.chat.completions.create(
model="deepseek-chat", # or "llama-3.3-70b", "kimi-2.5", etc.
messages=[{"role": "user", "content": "Hello"}]
)

All models work through the OpenAI endpoint (/v1/chat/completions) and the Anthropic-compatible endpoint (/anthropic/v1/messages). The API handles format translation automatically.