Why switch to CheapestInference?
- Lower costs: Save up to 90% on inference costs
- More models: Access to many open-source models
- No vendor lock-in: Open-source models with transparent pricing
- Better privacy: Your data stays private and secure
Migration guide
Using the official OpenAI SDK
You can use the official OpenAI Python or TypeScript SDKs with CheapestInference by simply changing the base URL and API key:Environment variables
For even easier migration, use environment variables:Supported endpoints
All major OpenAI API endpoints are supported:| Endpoint | Support | Notes |
|---|---|---|
/v1/chat/completions | ✅ Full | Including streaming, function calling, and structured outputs |
/v1/embeddings | ✅ Full | Multiple embedding models available |
/v1/models | ✅ Full | List all available models |
/v1/files | ✅ Full | File upload for fine-tuning |
/v1/batches | ✅ Full | Batch inference API |
Model mapping
While CheapestInference offers many models, here’s how OpenAI models map to our alternatives:| OpenAI Model | CheapestInference Alternative | Notes |
|---|---|---|
gpt-4 | meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | Similar quality, 80% cheaper |
gpt-4-turbo | meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | Higher quality, 70% cheaper |
gpt-3.5-turbo | meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | Faster and cheaper |
text-embedding-3-large | BAAI/bge-large-en-v1.5 | Best quality embeddings |
text-embedding-3-small | BAAI/bge-small-en-v1.5 | Fast and efficient |
whisper-1 | openai/whisper-large-v3 | Same model, lower cost |
dall-e-3 | black-forest-labs/FLUX.1-schnell | Better quality, faster |
Framework integrations
CheapestInference works seamlessly with popular frameworks:LangChain
LlamaIndex
Python
Vercel AI SDK
TypeScript
Feature comparison
| Feature | OpenAI | CheapestInference |
|---|---|---|
| Streaming | ✅ | ✅ |
| Function calling | ✅ | ✅ |
| Structured outputs | ✅ | ✅ |
| Embeddings | ✅ | ✅ |
| Batch API | ✅ | ✅ |
| JSON mode | ✅ | ✅ |
| Reproducible outputs | ✅ | ✅ |
Differences to be aware of
While we strive for full compatibility, there are a few differences:- Model names: Use our model naming format (e.g.,
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo) - Rate limits: Different rate limits apply (generally more generous)
- Model capabilities: Some models may have different context windows or capabilities
- Response times: Generally faster due to optimized infrastructure