Models
All models are available on every plan. Rate limits (RPM and TPM) are set at the key level, not per model.
List models
Section titled “List models”Query the live model list:
curl https://api.cheapestinference.com/v1/models \ -H "Authorization: Bearer YOUR_API_KEY"This returns the full list of available models. The response follows the OpenAI /v1/models format.
To get details about a specific model:
curl https://api.cheapestinference.com/v1/models/deepseek-chat \ -H "Authorization: Bearer YOUR_API_KEY"Available models
Section titled “Available models”| Model ID | Name |
|---|---|
llama-3.3-70b | Llama 3.3 70B |
llama-3.1-8b | Llama 3.1 8B |
llama-3.2-3b | Llama 3.2 3B |
DeepSeek
Section titled “DeepSeek”| Model ID | Name |
|---|---|
deepseek-chat | DeepSeek V3.2 |
deepseek-reasoner | DeepSeek R1 |
| Model ID | Name |
|---|---|
qwen3-235b | Qwen3 235B |
qwen3-30b | Qwen3 30B |
qwen3-coder | Qwen3 Coder |
Google (Gemma)
Section titled “Google (Gemma)”| Model ID | Name |
|---|---|
gemma-3-27b | Gemma 3 27B |
gemma-3-12b | Gemma 3 12B |
gemma-3-4b | Gemma 3 4B |
Moonshot
Section titled “Moonshot”| Model ID | Name |
|---|---|
kimi-2.5 | Kimi 2.5 |
Embeddings
Section titled “Embeddings”| Model ID | Name |
|---|---|
BAAI/bge-large-en-v1.5 | BGE Large |
BAAI/bge-base-en-v1.5 | BGE Base |
Using models
Section titled “Using models”Specify the model ID in your request:
# OpenAI SDK — any modelresponse = client.chat.completions.create( model="deepseek-chat", # or "llama-3.3-70b", "kimi-2.5", etc. messages=[{"role": "user", "content": "Hello"}])All models work through the OpenAI endpoint (/v1/chat/completions) and the Anthropic-compatible endpoint (/anthropic/v1/messages). The API handles format translation automatically.