Overview

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.CHEAPESTINFERENCE_API_KEY,
  baseURL: "https://api.cheapestinference.ai/v1",
});

const completion = await client.chat.completions.create({
  model: "openai/gpt-oss-20b",
  messages: [{ role: "user", content: "What are the top 3 things to do in New York?" }],
});
console.log(completion.choices[0].message.content);

What You Can Do

Run AI models

Run leading open source AI models (across chat, image, vision, etc…) with our OpenAI-compatible API.

Our Models

CheapestInference hosts many popular open source models, you’ll be charged based on the tokens you use and size of the model. Models are not quantized unless specified.

Chat models

DeepSeek R1

DeepSeek V3.1

GPT-OSS-120B

Llama 4 Maverick

Qwen 3 Next 80B

Kimi K2 0905

View all models →

Embedding models

Powerful embedding models for semantic search and retrieval. View all models →

Build AI apps and agents with CheapestInference

Build an agent

Build agent workflows to solve real use cases with CheapestInference

Build a Next.js chatbot

Spin up a production-ready chatbot using CheapestInference + Next.js.

Build RAG apps

Combine retrieval and generation to build grounded RAG apps.

Build a real-time image app

Stream real-time image generations with Flux Schnell on CheapestInference.

Build a text → app workflow

Turn natural language into interactive apps with CheapestInference + CodeSandbox.

Build an AI search engine

Ship a simplified Perplexity-style search using CheapestInference models.

Use structured inputs with LLM's

Get reliable JSON by defining schemas and using structured outputs.

Working with reasoning models

Use open reasoning models (e.g., DeepSeek-R1) for logic-heavy, multi-step tasks.

Getting Started

Inference

Capabilities

Other APIs

What You Can Do

Run AI models

Our Models

Chat models

DeepSeek R1

DeepSeek V3.1

GPT-OSS-120B

Llama 4 Maverick

Qwen 3 Next 80B

Kimi K2 0905

Embedding models

Build AI apps and agents with CheapestInference

Build an agent

Build a Next.js chatbot

Build RAG apps

Build a real-time image app

Build a text → app workflow

Build an AI search engine

Use structured inputs with LLM's

Working with reasoning models

Getting Started

Inference

Capabilities

Other APIs

​What You Can Do

Run AI models

​Our Models

​Chat models

DeepSeek R1

DeepSeek V3.1

GPT-OSS-120B

Llama 4 Maverick

Qwen 3 Next 80B

Kimi K2 0905

​Embedding models

​Build AI apps and agents with CheapestInference

Build an agent

Build a Next.js chatbot

Build RAG apps

Build a real-time image app

Build a text → app workflow

Build an AI search engine

Use structured inputs with LLM's

Working with reasoning models

What You Can Do

Our Models

Chat models

Embedding models

Build AI apps and agents with CheapestInference