All models are available on every plan. Rate limits (RPM and TPM) are set at the key level, not per model.
Query the live model list:
curl https://api.cheapestinference.com/v1/models \
-H " Authorization: Bearer YOUR_API_KEY "
Each model object includes an id, owned_by, and a type field ("chat" or "embedding") so you can filter programmatically:
"id" : " deepseek-ai/DeepSeek-V3.2 " ,
"owned_by" : " cheapestinference " ,
To get details about a specific model:
curl https://api.cheapestinference.com/v1/models/deepseek-chat \
-H " Authorization: Bearer YOUR_API_KEY "
Model ID Type Qwen/Qwen3.5-397B-A17Bchat Qwen/Qwen3.5-122B-A10Bchat Qwen/Qwen3.5-35B-A3Bchat Qwen/Qwen3-235B-A22B-Instruct-2507chat Qwen/Qwen3-Next-80B-A3B-Instructchat Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbochat Qwen/Qwen3-Coder-480B-A35B-Instructchat Qwen/Qwen3-235B-A22B-Thinking-2507chat Qwen/Qwen3-VL-235B-A22B-Instructchat Qwen/Qwen3-VL-30B-A3B-Instructchat
Model ID Type deepseek-ai/DeepSeek-V3.2chat deepseek-ai/DeepSeek-R1-0528chat deepseek-ai/DeepSeek-R1-0528-Turbochat deepseek-ai/DeepSeek-R1-Distill-Llama-70Bchat deepseek-ai/DeepSeek-OCRchat
Model ID Type meta-llama/Llama-4-Scout-17B-16E-Instructchat meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8chat meta-llama/Llama-Guard-4-12Bchat
Model ID Type moonshotai/Kimi-K2.5chat moonshotai/Kimi-K2.5-Turbochat moonshotai/Kimi-K2-Thinkingchat
Model ID Type MiniMaxAI/MiniMax-M2.5chat
Model ID Type zai-org/GLM-5chat zai-org/GLM-4.7-Flashchat
Model ID Type Qwen/Qwen3-Embedding-8Bembedding Qwen/Qwen3-Embedding-0.6B-batchembedding BAAI/bge-m3embedding intfloat/multilingual-e5-large-instructembedding nvidia/llama-nemotron-embed-vl-1b-v2embedding google/embeddinggemma-300membedding
Specify the model ID in your request:
response = client.chat.completions. create (
model = " deepseek-ai/DeepSeek-V3.2 " , # or "Qwen/Qwen3.5-397B-A17B", etc.
messages = [ { " role " : " user " , " content " : " Hello " } ]
All models work through the OpenAI endpoint (/v1/chat/completions) and the Anthropic-compatible endpoint (/anthropic/v1/messages). The API handles format translation automatically.