Assisters API
Models

Reranking Models

Models for improving search result relevance

Reranking Models

Improve search quality by reranking documents based on relevance to a query. Use Assisters Rerank as a second-stage ranker after initial retrieval to dramatically improve result quality.

Assisters Rerank v1

Model IDstring

assisters-rerank-v1

Our high-performance cross-encoder reranker that precisely scores query-document relevance for superior search results.

SpecificationValue
Model IDassisters-rerank-v1
TypeCross-encoder
Max Tokens8,192
Input Price$0.02 / million tokens
Latency~50ms per 10 docs

Capabilities

  • High Precision: Cross-encoder architecture for accurate relevance scoring
  • Long Context: Support for documents up to 8,192 tokens
  • Low Latency: Optimized for real-time search applications
  • Score Normalization: Relevance scores from 0 to 1
  • Batch Processing: Rerank up to 100 documents per request

Example Usage

import requests

response = requests.post(
    "https://api.assisters.dev/v1/rerank",
    headers={"Authorization": "Bearer your-api-key"},
    json={
        "model": "assisters-rerank-v1",
        "query": "machine learning basics",
        "documents": [
            "ML is a subset of AI that enables computers to learn",
            "Weather forecast shows rain tomorrow",
            "Deep learning uses neural networks for pattern recognition",
            "Recipe for chocolate cake",
            "Introduction to supervised and unsupervised learning"
        ],
        "top_n": 3
    }
)

results = response.json()["results"]
for r in results:
    print(f"Score: {r['relevance_score']:.3f} - {r['document'][:50]}...")

With Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://api.assisters.dev/v1",
    api_key="your-api-key"
)

# Note: rerank is available via REST API
# Use requests library as shown above

Parameters

ParameterTypeDefaultDescription
querystringrequiredThe search query
documentsarrayrequiredDocuments to rerank
modelstringrequiredModel ID (assisters-rerank-v1)
top_nintallReturn only top N results
return_documentsboolfalseInclude document text in response

How Reranking Works

graph LR
    A[Query] --> B[Initial Retrieval]
    B --> C[100 Candidates]
    C --> D[Reranker]
    D --> E[Top 10 Results]

    style D fill:#114F56,color:#fff
  1. Initial Retrieval: Fast search (vector/keyword) returns ~100 candidates
  2. Reranking: Cross-encoder scores each query-document pair precisely
  3. Final Results: Return the top N most relevant documents

Use Cases

Best Practices

Optimal Candidate Size

Rerank 50-100 candidates for best speed/quality tradeoff

Use for RAG

Reranking before LLM context significantly improves answer quality

Score Thresholds

Filter by relevance_score > 0.5 to remove noise

Combine with Embeddings

Use Assisters Embed for retrieval, then rerank for precision

Performance Tips

CandidatesLatency
10~50ms
50~150ms
100~300ms

Reranking is O(n) with document count. For real-time applications, limit candidates to 50-100.

Complete Example

Two-stage retrieval system with Assisters models:

from openai import OpenAI
import requests

client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.assisters.dev/v1"
)

def search(query: str, documents: list[str], top_k: int = 5):
    # Stage 1: Embed and retrieve
    query_embedding = client.embeddings.create(
        model="assisters-embed-v1",
        input=query
    ).data[0].embedding

    doc_embeddings = client.embeddings.create(
        model="assisters-embed-v1",
        input=documents
    ).data

    # Simple similarity search
    scores = [cosine_sim(query_embedding, d.embedding) for d in doc_embeddings]
    candidates = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)[:20]

    # Stage 2: Rerank
    response = requests.post(
        "https://api.assisters.dev/v1/rerank",
        headers={"Authorization": "Bearer your-api-key"},
        json={
            "model": "assisters-rerank-v1",
            "query": query,
            "documents": [c[0] for c in candidates],
            "top_n": top_k
        }
    )

    return response.json()["results"]