Skip to main content

Chat Models

Reasoning Models

DeepSeek R1

  • Context: 128K tokens
  • Best for: Complex reasoning, math, coding

DeepSeek V3.1

  • Context: 64K tokens
  • Best for: General purpose, reasoning

Large Language Models

Llama 4 Maverick

  • Context: 128K tokens
  • Best for: General purpose, instruction following

Qwen 3 Next 80B

  • Context: 32K tokens
  • Best for: Multilingual, coding

GPT-OSS-120B

  • Context: 8K tokens
  • Best for: Chat, creative writing

Kimi K2 0905

  • Context: 200K tokens
  • Best for: Long context tasks

Fast Models

Llama 3.1 8B Instruct

  • Context: 128K tokens
  • Best for: Quick responses, high throughput

Mistral 7B Instruct

  • Context: 32K tokens
  • Best for: Fast inference, simple tasks

Embedding Models

BGE Large EN v1.5

  • Dimensions: 1024
  • Best for: Semantic search, RAG

BGE Small EN v1.5

  • Dimensions: 384
  • Best for: Fast retrieval

E5 Large v2

  • Dimensions: 1024
  • Best for: General purpose

Multilingual E5 Large

  • Dimensions: 1024
  • Best for: 100+ languages

Pricing

  • Pay only for what you use
  • Token-based pricing
  • Auto-scaling infrastructure
  • Best for: Variable workloads

Model Features

FeatureSupport
Streaming✅ All chat models
Function calling✅ Most chat models
Structured outputs✅ Most chat models
JSON mode✅ Most chat models
Vision✅ Vision models
Multi-modal✅ Vision, audio models

Using Models

List all available models via API:
import os
import requests

resp = requests.get(
    "https://api.cheapestinference.ai/v1/models",
    headers={"Authorization": f"Bearer {os.environ['CHEAPESTINFERENCE_API_KEY']}"},
)
resp.raise_for_status()
for model in resp.json().get("data", []):
    print(f"{model['id']}: {model.get('description', '')}")