Skip to main content

Why switch to CheapestInference?

  • Lower costs: Save up to 90% on inference costs
  • More models: Access to many open-source models
  • No vendor lock-in: Open-source models with transparent pricing
  • Better privacy: Your data stays private and secure

Migration guide

Using the official OpenAI SDK

You can use the official OpenAI Python or TypeScript SDKs with CheapestInference by simply changing the base URL and API key:
from openai import OpenAI

# Just change these two lines!
client = OpenAI(
    api_key="your_cheapestinference_api_key",
    base_url="https://api.cheapestinference.ai/v1"
)

# Everything else works exactly the same
response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Environment variables

For even easier migration, use environment variables:
export OPENAI_API_KEY=your_cheapestinference_api_key
export OPENAI_BASE_URL=https://api.cheapestinference.ai/v1
Then your code doesn’t need to change at all:
from openai import OpenAI

client = OpenAI()  # Automatically uses env vars

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)

Supported endpoints

All major OpenAI API endpoints are supported:
EndpointSupportNotes
/v1/chat/completions✅ FullIncluding streaming, function calling, and structured outputs
/v1/embeddings✅ FullMultiple embedding models available
/v1/models✅ FullList all available models
/v1/files✅ FullFile upload for fine-tuning
/v1/batches✅ FullBatch inference API

Model mapping

While CheapestInference offers many models, here’s how OpenAI models map to our alternatives:
OpenAI ModelCheapestInference AlternativeNotes
gpt-4meta-llama/Meta-Llama-3.1-70B-Instruct-TurboSimilar quality, 80% cheaper
gpt-4-turbometa-llama/Meta-Llama-3.1-405B-Instruct-TurboHigher quality, 70% cheaper
gpt-3.5-turbometa-llama/Meta-Llama-3.1-8B-Instruct-TurboFaster and cheaper
text-embedding-3-largeBAAI/bge-large-en-v1.5Best quality embeddings
text-embedding-3-smallBAAI/bge-small-en-v1.5Fast and efficient
whisper-1openai/whisper-large-v3Same model, lower cost
dall-e-3black-forest-labs/FLUX.1-schnellBetter quality, faster

Framework integrations

CheapestInference works seamlessly with popular frameworks:

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    openai_api_key="your_cheapestinference_api_key",
    openai_api_base="https://api.cheapestinference.ai/v1"
)

LlamaIndex

Python
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    api_key="your_cheapestinference_api_key",
    api_base="https://api.cheapestinference.ai/v1"
)

Vercel AI SDK

TypeScript
import { openai } from '@ai-sdk/openai';

const provider = openai({
  apiKey: 'your_cheapestinference_api_key',
  baseURL: 'https://api.cheapestinference.ai/v1'
});

const model = provider('meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo');

Feature comparison

FeatureOpenAICheapestInference
Streaming
Function calling
Structured outputs
Embeddings
Batch API
JSON mode
Reproducible outputs

Differences to be aware of

While we strive for full compatibility, there are a few differences:
  1. Model names: Use our model naming format (e.g., meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo)
  2. Rate limits: Different rate limits apply (generally more generous)
  3. Model capabilities: Some models may have different context windows or capabilities
  4. Response times: Generally faster due to optimized infrastructure