Models

All models are available on every plan. Rate limits (RPM and TPM) are set at the key level, not per model.

List models

Query the live model list:

curl https://api.cheapestinference.com/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

Each model object includes an id, owned_by, and a type field ("chat" or "embedding") so you can filter programmatically:

{
  "id": "deepseek-ai/DeepSeek-V3.2",
  "object": "model",
  "created": 1677610602,
  "owned_by": "cheapestinference",
  "type": "chat"
}

To get details about a specific model:

curl https://api.cheapestinference.com/v1/models/deepseek-chat \
  -H "Authorization: Bearer YOUR_API_KEY"

Available models

Qwen

Model ID	Type
`Qwen/Qwen3.5-397B-A17B`	chat
`Qwen/Qwen3.5-122B-A10B`	chat
`Qwen/Qwen3.5-35B-A3B`	chat
`Qwen/Qwen3-235B-A22B-Instruct-2507`	chat
`Qwen/Qwen3-Next-80B-A3B-Instruct`	chat
`Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo`	chat
`Qwen/Qwen3-Coder-480B-A35B-Instruct`	chat
`Qwen/Qwen3-235B-A22B-Thinking-2507`	chat
`Qwen/Qwen3-VL-235B-A22B-Instruct`	chat
`Qwen/Qwen3-VL-30B-A3B-Instruct`	chat

DeepSeek

Model ID	Type
`deepseek-ai/DeepSeek-V3.2`	chat
`deepseek-ai/DeepSeek-R1-0528`	chat
`deepseek-ai/DeepSeek-R1-0528-Turbo`	chat
`deepseek-ai/DeepSeek-R1-Distill-Llama-70B`	chat
`deepseek-ai/DeepSeek-OCR`	chat

Model ID	Type
`meta-llama/Llama-4-Scout-17B-16E-Instruct`	chat
`meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8`	chat
`meta-llama/Llama-Guard-4-12B`	chat

Moonshot

Model ID	Type
`moonshotai/Kimi-K2.5`	chat
`moonshotai/Kimi-K2.5-Turbo`	chat
`moonshotai/Kimi-K2-Thinking`	chat

MiniMax

Model ID	Type
`MiniMaxAI/MiniMax-M2.5`	chat

Zhipu (GLM)

Model ID	Type
`zai-org/GLM-5`	chat
`zai-org/GLM-4.7-Flash`	chat

Embeddings

Model ID	Type
`Qwen/Qwen3-Embedding-8B`	embedding
`Qwen/Qwen3-Embedding-0.6B-batch`	embedding
`BAAI/bge-m3`	embedding
`intfloat/multilingual-e5-large-instruct`	embedding
`nvidia/llama-nemotron-embed-vl-1b-v2`	embedding
`google/embeddinggemma-300m`	embedding

Using models

Specify the model ID in your request:

# OpenAI SDK — any model
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3.2",  # or "Qwen/Qwen3.5-397B-A17B", etc.
    messages=[{"role": "user", "content": "Hello"}]
)

All models work through the OpenAI endpoint (/v1/chat/completions) and the Anthropic-compatible endpoint (/anthropic/v1/messages). The API handles format translation automatically.