Chat Models
Reasoning Models
DeepSeek R1
- Context: 128K tokens
- Best for: Complex reasoning, math, coding
DeepSeek V3.1
- Context: 64K tokens
- Best for: General purpose, reasoning
Large Language Models
Llama 4 Maverick
- Context: 128K tokens
- Best for: General purpose, instruction following
Qwen 3 Next 80B
- Context: 32K tokens
- Best for: Multilingual, coding
GPT-OSS-120B
- Context: 8K tokens
- Best for: Chat, creative writing
Kimi K2 0905
- Context: 200K tokens
- Best for: Long context tasks
Fast Models
Llama 3.1 8B Instruct
- Context: 128K tokens
- Best for: Quick responses, high throughput
Mistral 7B Instruct
- Context: 32K tokens
- Best for: Fast inference, simple tasks
Embedding Models
BGE Large EN v1.5
- Dimensions: 1024
- Best for: Semantic search, RAG
BGE Small EN v1.5
- Dimensions: 384
- Best for: Fast retrieval
E5 Large v2
- Dimensions: 1024
- Best for: General purpose
Multilingual E5 Large
- Dimensions: 1024
- Best for: 100+ languages
Pricing
- Pay only for what you use
- Token-based pricing
- Auto-scaling infrastructure
- Best for: Variable workloads
Model Features
| Feature | Support |
|---|---|
| Streaming | ✅ All chat models |
| Function calling | ✅ Most chat models |
| Structured outputs | ✅ Most chat models |
| JSON mode | ✅ Most chat models |
| Vision | ✅ Vision models |
| Multi-modal | ✅ Vision, audio models |