Skip to main content

What is RAG?

RAG combines:
  1. Retrieval: Search relevant documents
  2. Augmentation: Add context to prompts
  3. Generation: Generate informed responses

Basic RAG Example

import os
import requests
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["CHEAPESTINFERENCE_API_KEY"],
    base_url="https://api.cheapestinference.ai/v1",
)

# 1. Create embeddings for your documents
documents = [
    "Python is a programming language",
    "JavaScript is used for web development",
    "Machine learning is a subset of AI"
]

embeddings = client.embeddings.create(model="BAAI/bge-large-en-v1.5", input=documents)

# 2. Store in vector database (e.g., Pinecone, Weaviate)
# ... store embeddings ...

# 3. Query
query = "What is Python?"
query_embedding = client.embeddings.create(model="BAAI/bge-large-en-v1.5", input=[query])

# 4. Retrieve relevant documents
# ... search vector database ...
relevant_docs = ["Python is a programming language"]

# 5. Generate response
response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=[
        {
            "role": "system",
            "content": f"Answer based on this context: {relevant_docs}"
        },
        {"role": "user", "content": query}
    ]
)

print(response.choices[0].message.content)