Skip to main content

Get your API key

  1. Sign up for a CheapestInference account
  2. Navigate to your API keys page
  3. Create a new API key
Keep your API key secret! Never share it or commit it to version control.

Make your first request

curl https://api.cheapestinference.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CHEAPESTINFERENCE_API_KEY" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Tell me a fun fact about space!" }
    ]
  }'

Set your API key

export CHEAPESTINFERENCE_API_KEY=your_api_key_here

Use the OpenAI SDK (compatible)

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["CHEAPESTINFERENCE_API_KEY"],
    base_url="https://api.cheapestinference.ai/v1",
)

resp = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a fun fact about space!"},
    ],
)

print(resp.choices[0].message.content)

Stream responses

For a better user experience, stream responses as they’re generated using plain HTTP streaming:
import os
import requests

headers = {
    "Authorization": f"Bearer {os.environ['CHEAPESTINFERENCE_API_KEY']}",
    "Content-Type": "application/json",
}
json_data = {
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    "messages": [{"role": "user", "content": "Write a short story about a robot."}],
    "stream": True,
}

with requests.post(
    "https://api.cheapestinference.ai/v1/chat/completions",
    headers=headers,
    json=json_data,
    stream=True,
) as r:
    r.raise_for_status()
    for line in r.iter_lines(decode_unicode=True):
        if not line:
            continue
        if line.startswith("data: "):
            payload = line[len("data: "):]
            if payload == "[DONE]":
                break
            # Each payload is a JSON object with a delta
            import json as _json
            chunk = _json.loads(payload)
            delta = chunk.get("choices", [{}])[0].get("delta", {}).get("content")
            if delta:
                print(delta, end="", flush=True)