Best Practices
Production tips for building with Assisters API
Best Practices
TL;DR
Never expose API keys client-side. Use streaming for chat UX. Implement exponential backoff for retries. Cache embeddings. Set appropriate timeouts. Monitor usage via response headers. Use moderation endpoint before user-facing content.
Follow these guidelines to build reliable, efficient, and cost-effective applications with Assisters API.
API Key Security
Use Environment Variables
Never hardcode API keys in source code
Rotate Regularly
Create new keys and revoke old ones periodically
Separate Environments
Use different keys for dev, staging, and production
Restrict Domains
Set allowed domains for client-side usage
# Good: Environment variable
import os
api_key = os.environ["ASSISTERS_API_KEY"]
# Bad: Hardcoded
api_key = "ask_abc123..." # Never do this!Error Handling
Always handle errors gracefully:
from openai import OpenAI, APIError, RateLimitError, AuthenticationError
client = OpenAI(api_key="ask_...", base_url="https://api.assisters.dev/v1")
def safe_completion(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="assisters-chat-v1",
messages=messages
)
except AuthenticationError:
# Don't retry auth errors
raise
except RateLimitError as e:
if attempt == max_retries - 1:
raise
wait = int(e.response.headers.get("Retry-After", 5))
time.sleep(wait)
except APIError as e:
if e.status_code >= 500 and attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
else:
raisePrompt Engineering
Use System Messages
Set consistent behavior with system messages:
messages = [
{
"role": "system",
"content": """You are a helpful customer support agent for TechCorp.
- Be friendly and professional
- Only answer questions about our products
- If unsure, say you'll escalate to a human"""
},
{"role": "user", "content": user_question}
]Be Specific
# Vague (unpredictable results)
"Summarize this"
# Specific (better results)
"Summarize this article in 3 bullet points, each under 20 words"Provide Examples
messages = [
{
"role": "system",
"content": """Extract entities from text. Format as JSON.
Example:
Input: "John Smith called from New York about order #12345"
Output: {"person": "John Smith", "location": "New York", "order_id": "12345"}"""
},
{"role": "user", "content": user_input}
]Performance Optimization
Enable Streaming
For better UX in chat applications:
stream = client.chat.completions.create(
model="assisters-chat-v1",
messages=messages,
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")Batch Requests
For embeddings and moderation, batch multiple inputs:
# Efficient: Single request with multiple inputs
response = client.embeddings.create(
model="assisters-embed-v1",
input=["text1", "text2", "text3", ...] # Up to 100
)
# Inefficient: Multiple requests
for text in texts:
response = client.embeddings.create(model="assisters-embed-v1", input=text)Cache Results
Don't re-request the same data:
from functools import lru_cache
import hashlib
@lru_cache(maxsize=1000)
def get_embedding(text_hash):
# Fetch from API
pass
def embed(text):
text_hash = hashlib.md5(text.encode()).hexdigest()
return get_embedding(text_hash)Cost Management
Set Token Limits
response = client.chat.completions.create(
model="assisters-chat-v1",
messages=messages,
max_tokens=500 # Prevent runaway responses
)Choose the Right Model
| Task | Recommended Model | Why |
|---|---|---|
| General chat | assisters-chat-v1 | Best value |
| Vision tasks | assisters-vision-v1 | Image understanding |
| Code generation | assisters-code-v1 | Optimized for code |
Monitor Usage
# Track usage after each request
response = client.chat.completions.create(...)
print(f"Tokens used: {response.usage.total_tokens}")
print(f"Cost: ${response.usage.total_tokens * 0.10 / 1_000_000:.6f}")Reliability
Implement Timeouts
from openai import OpenAI
client = OpenAI(
api_key="ask_...",
base_url="https://api.assisters.dev/v1",
timeout=30.0 # 30 second timeout
)Use Idempotency Keys
For critical operations:
import uuid
response = requests.post(
"https://api.assisters.dev/v1/chat/completions",
headers={
"Authorization": "Bearer ask_...",
"Idempotency-Key": str(uuid.uuid4())
},
json={...}
)Implement Circuit Breakers
class CircuitBreaker:
def __init__(self, failure_threshold=5, reset_timeout=60):
self.failures = 0
self.threshold = failure_threshold
self.reset_timeout = reset_timeout
self.last_failure = None
self.state = "closed"
def call(self, func):
if self.state == "open":
if time.time() - self.last_failure > self.reset_timeout:
self.state = "half-open"
else:
raise Exception("Circuit breaker is open")
try:
result = func()
self.failures = 0
self.state = "closed"
return result
except Exception as e:
self.failures += 1
self.last_failure = time.time()
if self.failures >= self.threshold:
self.state = "open"
raiseContent Safety
Moderate Inputs
def safe_chat(user_message):
# Check user input
moderation = client.moderations.create(
model="assisters-moderation-v1",
input=user_message
)
if moderation.results[0].flagged:
return "I can't respond to that message."
# Generate response
response = client.chat.completions.create(
model="assisters-chat-v1",
messages=[{"role": "user", "content": user_message}]
)
return response.choices[0].message.contentValidate Outputs
def validated_chat(user_message):
response = client.chat.completions.create(...)
content = response.choices[0].message.content
# Check output
output_mod = client.moderations.create(
model="assisters-moderation-v1",
input=content
)
if output_mod.results[0].flagged:
return "I need to rephrase my response."
return contentLogging & Monitoring
Log Important Data
import logging
logger = logging.getLogger(__name__)
def logged_completion(messages):
start_time = time.time()
try:
response = client.chat.completions.create(
model="assisters-chat-v1",
messages=messages
)
logger.info(
"API call successful",
extra={
"model": "assisters-chat-v1",
"tokens": response.usage.total_tokens,
"latency_ms": (time.time() - start_time) * 1000
}
)
return response
except Exception as e:
logger.error(f"API call failed: {e}")
raiseTrack Metrics
Key metrics to monitor:
- Request latency
- Token usage
- Error rates
- Cost per request
Checklist
Security
Reliability
Performance
Cost
Safety