Best Practices

TL;DR

Never expose API keys client-side. Use streaming for chat UX. Implement exponential backoff for retries. Cache embeddings. Set appropriate timeouts. Monitor usage via response headers. Use moderation endpoint before user-facing content.

Follow these guidelines to build reliable, efficient, and cost-effective applications with Assisters API.

API Key Security

Use Environment Variables

Never hardcode API keys in source code

Rotate Regularly

Create new keys and revoke old ones periodically

Separate Environments

Use different keys for dev, staging, and production

Restrict Domains

Set allowed domains for client-side usage

# Good: Environment variable
import os
api_key = os.environ["ASSISTERS_API_KEY"]

# Bad: Hardcoded
api_key = "ask_abc123..."  # Never do this!

Error Handling

Always handle errors gracefully:

from openai import OpenAI, APIError, RateLimitError, AuthenticationError

client = OpenAI(api_key="ask_...", base_url="https://api.assisters.dev/v1")

def safe_completion(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="assisters-chat-v1",
                messages=messages
            )

        except AuthenticationError:
            # Don't retry auth errors
            raise

        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            wait = int(e.response.headers.get("Retry-After", 5))
            time.sleep(wait)

        except APIError as e:
            if e.status_code >= 500 and attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                raise

Prompt Engineering

Use System Messages

Set consistent behavior with system messages:

messages = [
    {
        "role": "system",
        "content": """You are a helpful customer support agent for TechCorp.
        - Be friendly and professional
        - Only answer questions about our products
        - If unsure, say you'll escalate to a human"""
    },
    {"role": "user", "content": user_question}
]

Be Specific

# Vague (unpredictable results)
"Summarize this"

# Specific (better results)
"Summarize this article in 3 bullet points, each under 20 words"

Provide Examples

messages = [
    {
        "role": "system",
        "content": """Extract entities from text. Format as JSON.

Example:
Input: "John Smith called from New York about order #12345"
Output: {"person": "John Smith", "location": "New York", "order_id": "12345"}"""
    },
    {"role": "user", "content": user_input}
]

Performance Optimization

Enable Streaming

For better UX in chat applications:

stream = client.chat.completions.create(
    model="assisters-chat-v1",
    messages=messages,
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Batch Requests

For embeddings and moderation, batch multiple inputs:

# Efficient: Single request with multiple inputs
response = client.embeddings.create(
    model="assisters-embed-v1",
    input=["text1", "text2", "text3", ...]  # Up to 100
)

# Inefficient: Multiple requests
for text in texts:
    response = client.embeddings.create(model="assisters-embed-v1", input=text)

Cache Results

Don't re-request the same data:

from functools import lru_cache
import hashlib

@lru_cache(maxsize=1000)
def get_embedding(text_hash):
    # Fetch from API
    pass

def embed(text):
    text_hash = hashlib.md5(text.encode()).hexdigest()
    return get_embedding(text_hash)

Cost Management

Set Token Limits

response = client.chat.completions.create(
    model="assisters-chat-v1",
    messages=messages,
    max_tokens=500  # Prevent runaway responses
)

Choose the Right Model

Task	Recommended Model	Why
General chat	`assisters-chat-v1`	Best value
Vision tasks	`assisters-vision-v1`	Image understanding
Code generation	`assisters-code-v1`	Optimized for code

Monitor Usage

# Track usage after each request
response = client.chat.completions.create(...)

print(f"Tokens used: {response.usage.total_tokens}")
print(f"Cost: ${response.usage.total_tokens * 0.10 / 1_000_000:.6f}")

Reliability

Implement Timeouts

from openai import OpenAI

client = OpenAI(
    api_key="ask_...",
    base_url="https://api.assisters.dev/v1",
    timeout=30.0  # 30 second timeout
)

Use Idempotency Keys

For critical operations:

import uuid

response = requests.post(
    "https://api.assisters.dev/v1/chat/completions",
    headers={
        "Authorization": "Bearer ask_...",
        "Idempotency-Key": str(uuid.uuid4())
    },
    json={...}
)

Implement Circuit Breakers

class CircuitBreaker:
    def __init__(self, failure_threshold=5, reset_timeout=60):
        self.failures = 0
        self.threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.last_failure = None
        self.state = "closed"

    def call(self, func):
        if self.state == "open":
            if time.time() - self.last_failure > self.reset_timeout:
                self.state = "half-open"
            else:
                raise Exception("Circuit breaker is open")

        try:
            result = func()
            self.failures = 0
            self.state = "closed"
            return result
        except Exception as e:
            self.failures += 1
            self.last_failure = time.time()
            if self.failures >= self.threshold:
                self.state = "open"
            raise

Content Safety

Moderate Inputs

def safe_chat(user_message):
    # Check user input
    moderation = client.moderations.create(
        model="assisters-moderation-v1",
        input=user_message
    )

    if moderation.results[0].flagged:
        return "I can't respond to that message."

    # Generate response
    response = client.chat.completions.create(
        model="assisters-chat-v1",
        messages=[{"role": "user", "content": user_message}]
    )

    return response.choices[0].message.content

Validate Outputs

def validated_chat(user_message):
    response = client.chat.completions.create(...)
    content = response.choices[0].message.content

    # Check output
    output_mod = client.moderations.create(
        model="assisters-moderation-v1",
        input=content
    )

    if output_mod.results[0].flagged:
        return "I need to rephrase my response."

    return content

Logging & Monitoring

Log Important Data

import logging

logger = logging.getLogger(__name__)

def logged_completion(messages):
    start_time = time.time()

    try:
        response = client.chat.completions.create(
            model="assisters-chat-v1",
            messages=messages
        )

        logger.info(
            "API call successful",
            extra={
                "model": "assisters-chat-v1",
                "tokens": response.usage.total_tokens,
                "latency_ms": (time.time() - start_time) * 1000
            }
        )

        return response

    except Exception as e:
        logger.error(f"API call failed: {e}")
        raise

Track Metrics

Key metrics to monitor:

Request latency
Token usage
Error rates
Cost per request

Checklist

Security

API keys in environment variables

Keys rotated regularly

Domain restrictions set

Reliability

Error handling with retries

Timeouts configured

Circuit breakers for dependencies

Performance

Streaming enabled for chat

Batching for embeddings

Caching for repeated queries

Cost

Token limits set

Right model for the task

Usage monitoring enabled

Safety

Input moderation

Output validation

Logging for audit

Best Practices

Use Environment Variables

Rotate Regularly

Separate Environments

Restrict Domains

On this page