Models

Chat Models

Reasoning Models

DeepSeek R1

Context: 128K tokens
Best for: Complex reasoning, math, coding

DeepSeek V3.1

Context: 64K tokens
Best for: General purpose, reasoning

Large Language Models

Llama 4 Maverick

Context: 128K tokens
Best for: General purpose, instruction following

Qwen 3 Next 80B

Context: 32K tokens
Best for: Multilingual, coding

GPT-OSS-120B

Context: 8K tokens
Best for: Chat, creative writing

Kimi K2 0905

Context: 200K tokens
Best for: Long context tasks

Fast Models

Llama 3.1 8B Instruct

Context: 128K tokens
Best for: Quick responses, high throughput

Mistral 7B Instruct

Context: 32K tokens
Best for: Fast inference, simple tasks

Embedding Models

BGE Large EN v1.5

Dimensions: 1024
Best for: Semantic search, RAG

BGE Small EN v1.5

Dimensions: 384
Best for: Fast retrieval

E5 Large v2

Dimensions: 1024
Best for: General purpose

Multilingual E5 Large

Dimensions: 1024
Best for: 100+ languages

Pricing

Pay only for what you use
Token-based pricing
Auto-scaling infrastructure
Best for: Variable workloads

Model Features

Feature	Support
Streaming	✅ All chat models
Function calling	✅ Most chat models
Structured outputs	✅ Most chat models
JSON mode	✅ Most chat models
Vision	✅ Vision models
Multi-modal	✅ Vision, audio models

Using Models

List all available models via API:

import os
import requests

resp = requests.get(
    "https://api.cheapestinference.ai/v1/models",
    headers={"Authorization": f"Bearer {os.environ['CHEAPESTINFERENCE_API_KEY']}"},
)
resp.raise_for_status()
for model in resp.json().get("data", []):
    print(f"{model['id']}: {model.get('description', '')}")

Getting Started

Inference

Capabilities

Other APIs

Chat Models

Reasoning Models

DeepSeek R1

DeepSeek V3.1

Large Language Models

Llama 4 Maverick

Qwen 3 Next 80B

GPT-OSS-120B

Kimi K2 0905

Fast Models

Llama 3.1 8B Instruct

Mistral 7B Instruct

Embedding Models

BGE Large EN v1.5

BGE Small EN v1.5

E5 Large v2

Multilingual E5 Large

Pricing

Model Features

Using Models

Getting Started

Inference

Capabilities

Other APIs

​Chat Models

​Reasoning Models

DeepSeek R1

DeepSeek V3.1

​Large Language Models

Llama 4 Maverick

Qwen 3 Next 80B

GPT-OSS-120B

Kimi K2 0905

​Fast Models

Llama 3.1 8B Instruct

Mistral 7B Instruct

​Embedding Models

BGE Large EN v1.5

BGE Small EN v1.5

E5 Large v2

Multilingual E5 Large

​Pricing

​Model Features

​Using Models

Chat Models

Reasoning Models

Large Language Models

Fast Models

Embedding Models

Pricing

Model Features

Using Models