Models
Reranking Models
Models for improving search result relevance
Reranking Models
Improve search quality by reranking documents based on relevance to a query. Use Assisters Rerank as a second-stage ranker after initial retrieval to dramatically improve result quality.
Assisters Rerank v1
Model IDstringassisters-rerank-v1
Our high-performance cross-encoder reranker that precisely scores query-document relevance for superior search results.
| Specification | Value |
|---|---|
| Model ID | assisters-rerank-v1 |
| Type | Cross-encoder |
| Max Tokens | 8,192 |
| Input Price | $0.02 / million tokens |
| Latency | ~50ms per 10 docs |
Capabilities
- High Precision: Cross-encoder architecture for accurate relevance scoring
- Long Context: Support for documents up to 8,192 tokens
- Low Latency: Optimized for real-time search applications
- Score Normalization: Relevance scores from 0 to 1
- Batch Processing: Rerank up to 100 documents per request
Example Usage
import requests
response = requests.post(
"https://api.assisters.dev/v1/rerank",
headers={"Authorization": "Bearer your-api-key"},
json={
"model": "assisters-rerank-v1",
"query": "machine learning basics",
"documents": [
"ML is a subset of AI that enables computers to learn",
"Weather forecast shows rain tomorrow",
"Deep learning uses neural networks for pattern recognition",
"Recipe for chocolate cake",
"Introduction to supervised and unsupervised learning"
],
"top_n": 3
}
)
results = response.json()["results"]
for r in results:
print(f"Score: {r['relevance_score']:.3f} - {r['document'][:50]}...")With Python SDK
from openai import OpenAI
client = OpenAI(
base_url="https://api.assisters.dev/v1",
api_key="your-api-key"
)
# Note: rerank is available via REST API
# Use requests library as shown aboveParameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | required | The search query |
documents | array | required | Documents to rerank |
model | string | required | Model ID (assisters-rerank-v1) |
top_n | int | all | Return only top N results |
return_documents | bool | false | Include document text in response |
How Reranking Works
graph LR
A[Query] --> B[Initial Retrieval]
B --> C[100 Candidates]
C --> D[Reranker]
D --> E[Top 10 Results]
style D fill:#114F56,color:#fff- Initial Retrieval: Fast search (vector/keyword) returns ~100 candidates
- Reranking: Cross-encoder scores each query-document pair precisely
- Final Results: Return the top N most relevant documents
Use Cases
Best Practices
Optimal Candidate Size
Rerank 50-100 candidates for best speed/quality tradeoff
Use for RAG
Reranking before LLM context significantly improves answer quality
Score Thresholds
Filter by relevance_score > 0.5 to remove noise
Combine with Embeddings
Use Assisters Embed for retrieval, then rerank for precision
Performance Tips
| Candidates | Latency |
|---|---|
| 10 | ~50ms |
| 50 | ~150ms |
| 100 | ~300ms |
Reranking is O(n) with document count. For real-time applications, limit candidates to 50-100.
Complete Example
Two-stage retrieval system with Assisters models:
from openai import OpenAI
import requests
client = OpenAI(
api_key="your-api-key",
base_url="https://api.assisters.dev/v1"
)
def search(query: str, documents: list[str], top_k: int = 5):
# Stage 1: Embed and retrieve
query_embedding = client.embeddings.create(
model="assisters-embed-v1",
input=query
).data[0].embedding
doc_embeddings = client.embeddings.create(
model="assisters-embed-v1",
input=documents
).data
# Simple similarity search
scores = [cosine_sim(query_embedding, d.embedding) for d in doc_embeddings]
candidates = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)[:20]
# Stage 2: Rerank
response = requests.post(
"https://api.assisters.dev/v1/rerank",
headers={"Authorization": "Bearer your-api-key"},
json={
"model": "assisters-rerank-v1",
"query": query,
"documents": [c[0] for c in candidates],
"top_n": top_k
}
)
return response.json()["results"]