Assisters API

Build a RAG System

Build a production-ready Retrieval-Augmented Generation system with Assisters API

Build a RAG System with Assisters API

TL;DR

Use assisters-embed-v1 for embeddings (1024 dimensions, 100+ languages), assisters-rerank-v1 to improve retrieval quality, and assisters-chat-v1 for generation. Complete RAG pipeline with 3 API calls. Cost: ~$0.05 per 1000 queries.

Retrieval-Augmented Generation (RAG) combines your knowledge base with AI to answer questions accurately. This tutorial shows you how to build a production RAG system using Assisters' proprietary models.

Why Assisters for RAG?

Multilingual Embeddings

assisters-embed-v1 supports 100+ languages with 1024-dimensional vectors optimized for semantic search

Smart Reranking

assisters-rerank-v1 dramatically improves retrieval quality by reordering results by relevance

Advanced Chat

assisters-chat-v1 with 128K context window handles large retrieved contexts

Cost Effective

Embeddings at $0.01/M tokens, reranking at $0.02/M tokens—fraction of competitors

Architecture Overview

┌─────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Query     │────▶│ assisters-embed  │────▶│  Vector Search  │
└─────────────┘     └──────────────────┘     └─────────────────┘


┌─────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Answer    │◀────│ assisters-chat   │◀────│ assisters-rerank│
└─────────────┘     └──────────────────┘     └─────────────────┘

Prerequisites

An Assisters API key (get one free)
Python 3.8+ with pip
A vector database (we'll use ChromaDB for simplicity)

Step 1: Set Up Your Environment

pip install openai chromadb
from openai import OpenAI

# Initialize Assisters client
client = OpenAI(
    api_key="ask_your_api_key",  # Get from assisters.dev/dashboard
    base_url="https://api.assisters.dev/v1"
)

Step 2: Create Embeddings for Your Documents

Use assisters-embed-v1 to convert your documents into vectors:

import chromadb

# Initialize ChromaDB
chroma = chromadb.Client()
collection = chroma.create_collection("knowledge_base")

# Your documents
documents = [
    "Assisters API provides 9 proprietary AI models for developers.",
    "The assisters-chat-v1 model supports 128K context with streaming.",
    "Embeddings are priced at $0.01 per million tokens.",
    "Rate limits start at 10 RPM for free tier, 500 RPM for Startup.",
    # Add your documents here...
]

# Generate embeddings with Assisters
def get_embeddings(texts: list[str]) -> list[list[float]]:
    response = client.embeddings.create(
        model="assisters-embed-v1",  # Assisters multilingual embeddings
        input=texts
    )
    return [item.embedding for item in response.data]

# Index documents
embeddings = get_embeddings(documents)
collection.add(
    documents=documents,
    embeddings=embeddings,
    ids=[f"doc_{i}" for i in range(len(documents))]
)

print(f"Indexed {len(documents)} documents")
import OpenAI from 'openai';
import { ChromaClient } from 'chromadb';

const client = new OpenAI({
  apiKey: 'ask_your_api_key',
  baseURL: 'https://api.assisters.dev/v1'
});

const chroma = new ChromaClient();

async function getEmbeddings(texts) {
  const response = await client.embeddings.create({
    model: 'assisters-embed-v1',
    input: texts
  });
  return response.data.map(item => item.embedding);
}

// Index your documents
const documents = [
  "Assisters API provides 9 proprietary AI models for developers.",
  // ... more documents
];

const embeddings = await getEmbeddings(documents);
const collection = await chroma.createCollection({ name: 'knowledge_base' });
await collection.add({
  documents,
  embeddings,
  ids: documents.map((_, i) => `doc_${i}`)
});

Step 3: Retrieve Relevant Documents

Search your vector database using query embeddings:

def retrieve(query: str, top_k: int = 10) -> list[str]:
    # Embed the query with Assisters
    query_embedding = get_embeddings([query])[0]

    # Search vector database
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=top_k
    )

    return results["documents"][0]

Step 4: Rerank for Better Quality

Pro Tip: Reranking with assisters-rerank-v1 typically improves answer quality by 15-30% compared to vector search alone. It's the secret weapon for production RAG systems.

import requests

def rerank(query: str, documents: list[str], top_k: int = 5) -> list[str]:
    """Rerank documents using Assisters Rerank model."""
    response = requests.post(
        "https://api.assisters.dev/v1/rerank",
        headers={
            "Authorization": "Bearer ask_your_api_key",
            "Content-Type": "application/json"
        },
        json={
            "model": "assisters-rerank-v1",
            "query": query,
            "documents": documents,
            "top_n": top_k
        }
    )

    results = response.json()["results"]
    # Return documents sorted by relevance score
    return [documents[r["index"]] for r in results]

Step 5: Generate the Answer

Use assisters-chat-v1 to generate answers from retrieved context:

def generate_answer(query: str, context: list[str]) -> str:
    """Generate answer using Assisters Chat model."""

    # Build context string
    context_str = "\n\n".join([f"[{i+1}] {doc}" for i, doc in enumerate(context)])

    response = client.chat.completions.create(
        model="assisters-chat-v1",  # Assisters flagship chat model
        messages=[
            {
                "role": "system",
                "content": """You are a helpful assistant that answers questions based on the provided context.

Rules:
- Only use information from the context
- If the answer isn't in the context, say "I don't have that information"
- Cite sources using [1], [2], etc."""
            },
            {
                "role": "user",
                "content": f"""Context:
{context_str}

Question: {query}

Answer:"""
            }
        ],
        temperature=0.3,  # Lower temperature for factual answers
        max_tokens=500
    )

    return response.choices[0].message.content

Step 6: Complete RAG Pipeline

Put it all together:

def rag_query(query: str) -> str:
    """Complete RAG pipeline using Assisters API."""

    # Step 1: Retrieve candidates
    candidates = retrieve(query, top_k=10)

    # Step 2: Rerank for quality
    relevant_docs = rerank(query, candidates, top_k=5)

    # Step 3: Generate answer
    answer = generate_answer(query, relevant_docs)

    return answer

# Example usage
question = "What models does Assisters offer?"
answer = rag_query(question)
print(answer)

Streaming for Better UX

For real-time responses, enable streaming:

def rag_query_streaming(query: str):
    """RAG with streaming responses."""

    candidates = retrieve(query, top_k=10)
    relevant_docs = rerank(query, candidates, top_k=5)
    context_str = "\n\n".join([f"[{i+1}] {doc}" for i, doc in enumerate(relevant_docs)])

    # Stream the response
    stream = client.chat.completions.create(
        model="assisters-chat-v1",
        messages=[
            {"role": "system", "content": "Answer based on context. Cite sources."},
            {"role": "user", "content": f"Context:\n{context_str}\n\nQuestion: {query}"}
        ],
        stream=True
    )

    for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

# Real-time output
rag_query_streaming("How much do embeddings cost?")

Cost Estimation

ComponentModelCost per 1000 Queries
Query Embeddingassisters-embed-v1$0.0001
Rerankingassisters-rerank-v1$0.001
Generationassisters-chat-v1$0.05
Total~$0.05

Assisters pricing is 5-10x cheaper than comparable solutions while maintaining quality. See pricing for details.

Production Best Practices

Next Steps