OpenAI Compatibility

Why switch to CheapestInference?

Lower costs: Save up to 90% on inference costs
More models: Access to many open-source models
No vendor lock-in: Open-source models with transparent pricing
Better privacy: Your data stays private and secure

Migration guide

Using the official OpenAI SDK

You can use the official OpenAI Python or TypeScript SDKs with CheapestInference by simply changing the base URL and API key:

from openai import OpenAI

# Just change these two lines!
client = OpenAI(
    api_key="your_cheapestinference_api_key",
    base_url="https://api.cheapestinference.ai/v1"
)

# Everything else works exactly the same
response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Environment variables

For even easier migration, use environment variables:

export OPENAI_API_KEY=your_cheapestinference_api_key
export OPENAI_BASE_URL=https://api.cheapestinference.ai/v1

Then your code doesn’t need to change at all:

from openai import OpenAI

client = OpenAI()  # Automatically uses env vars

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)

Supported endpoints

All major OpenAI API endpoints are supported:

Endpoint	Support	Notes
`/v1/chat/completions`	✅ Full	Including streaming, function calling, and structured outputs
`/v1/embeddings`	✅ Full	Multiple embedding models available
`/v1/models`	✅ Full	List all available models
`/v1/files`	✅ Full	File upload for fine-tuning
`/v1/batches`	✅ Full	Batch inference API

Model mapping

While CheapestInference offers many models, here’s how OpenAI models map to our alternatives:

OpenAI Model	CheapestInference Alternative	Notes
`gpt-4`	`meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo`	Similar quality, 80% cheaper
`gpt-4-turbo`	`meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo`	Higher quality, 70% cheaper
`gpt-3.5-turbo`	`meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo`	Faster and cheaper
`text-embedding-3-large`	`BAAI/bge-large-en-v1.5`	Best quality embeddings
`text-embedding-3-small`	`BAAI/bge-small-en-v1.5`	Fast and efficient
`whisper-1`	`openai/whisper-large-v3`	Same model, lower cost
`dall-e-3`	`black-forest-labs/FLUX.1-schnell`	Better quality, faster

Framework integrations

CheapestInference works seamlessly with popular frameworks:

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    openai_api_key="your_cheapestinference_api_key",
    openai_api_base="https://api.cheapestinference.ai/v1"
)

LlamaIndex

Python

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    api_key="your_cheapestinference_api_key",
    api_base="https://api.cheapestinference.ai/v1"
)

Vercel AI SDK

TypeScript

import { openai } from '@ai-sdk/openai';

const provider = openai({
  apiKey: 'your_cheapestinference_api_key',
  baseURL: 'https://api.cheapestinference.ai/v1'
});

const model = provider('meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo');

Feature comparison

Feature	OpenAI	CheapestInference
Streaming	✅	✅
Function calling	✅	✅
Structured outputs	✅	✅
Embeddings	✅	✅
Batch API	✅	✅
JSON mode	✅	✅
Reproducible outputs	✅	✅

Differences to be aware of

While we strive for full compatibility, there are a few differences:

Model names: Use our model naming format (e.g., meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo)
Rate limits: Different rate limits apply (generally more generous)
Model capabilities: Some models may have different context windows or capabilities
Response times: Generally faster due to optimized infrastructure

Getting Started

Inference

Capabilities

Other APIs

Why switch to CheapestInference?

Migration guide

Using the official OpenAI SDK

Environment variables

Supported endpoints

Model mapping

Framework integrations

LangChain

LlamaIndex

Vercel AI SDK

Feature comparison

Differences to be aware of

Getting Started

Inference

Capabilities

Other APIs

​Why switch to CheapestInference?

​Migration guide

​Using the official OpenAI SDK

​Environment variables

​Supported endpoints

​Model mapping

​Framework integrations

​LangChain

​LlamaIndex

​Vercel AI SDK

​Feature comparison

​Differences to be aware of

Why switch to CheapestInference?

Migration guide

Using the official OpenAI SDK

Environment variables

Supported endpoints

Model mapping

Framework integrations

LangChain

LlamaIndex

Vercel AI SDK

Feature comparison

Differences to be aware of