Chat Models
DeepSeek R1 (Reasoning)
Best for complex reasoning, mathematics, and coding tasks.Llama 3.1 70B (General Purpose)
Best all-around model for most tasks.Qwen 3 Next 80B (Multilingual)
Best for multilingual and coding tasks.Embedding Models
BGE Large EN v1.5
High-quality embeddings for RAG and semantic search.Function Calling
Enable models to call external functions.Structured Outputs
Get reliable JSON responses.Streaming Responses
Stream responses for better UX.Batch Processing
Process multiple requests efficiently.Best Practices
Choose the right model
Choose the right model
- Use smaller models (8B) for simple tasks
- Use larger models (70B+) for complex reasoning
Optimize parameters
Optimize parameters
- Lower temperature (0.3-0.7) for factual tasks
- Higher temperature (0.7-1.0) for creative tasks
Handle errors gracefully
Handle errors gracefully
- Implement retry logic with exponential backoff
- Monitor rate limits
- Log errors for debugging
Monitor costs
Monitor costs
- Track token usage
- Use smaller models when possible