What is RAG?
RAG combines:- Retrieval: Search relevant documents
- Augmentation: Add context to prompts
- Generation: Generate informed responses
Basic RAG Example
Copy
import os
import requests
from openai import OpenAI
client = OpenAI(
api_key=os.environ["CHEAPESTINFERENCE_API_KEY"],
base_url="https://api.cheapestinference.ai/v1",
)
# 1. Create embeddings for your documents
documents = [
"Python is a programming language",
"JavaScript is used for web development",
"Machine learning is a subset of AI"
]
embeddings = client.embeddings.create(model="BAAI/bge-large-en-v1.5", input=documents)
# 2. Store in vector database (e.g., Pinecone, Weaviate)
# ... store embeddings ...
# 3. Query
query = "What is Python?"
query_embedding = client.embeddings.create(model="BAAI/bge-large-en-v1.5", input=[query])
# 4. Retrieve relevant documents
# ... search vector database ...
relevant_docs = ["Python is a programming language"]
# 5. Generate response
response = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
messages=[
{
"role": "system",
"content": f"Answer based on this context: {relevant_docs}"
},
{"role": "user", "content": query}
]
)
print(response.choices[0].message.content)