Assisters API
Billing

Usage Tracking

Monitor your API usage and costs in real-time

Usage Tracking

Monitor your token consumption, track costs, and optimize your API usage.

Dashboard Overview

View your usage at assisters.dev/dashboard/usage:

  • Usage by Model: Breakdown by model type
  • Daily Trends: Usage patterns over time
  • Top Endpoints: Most used API endpoints
  • Cost Breakdown: Detailed billing information

Usage in API Responses

Every API response includes usage information:

{
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 100,
    "total_tokens": 125
  }
}
FieldDescription
prompt_tokensTokens in your input
completion_tokensTokens in the response
total_tokensTotal tokens (billed)

Tracking Usage in Code

Python Example

from openai import OpenAI

client = OpenAI(api_key="ask_...", base_url="https://api.assisters.dev/v1")

class UsageTracker:
    def __init__(self):
        self.total_tokens = 0
        self.total_cost = 0
        self.requests = 0

    def record(self, usage, price_per_million=0.10):
        self.total_tokens += usage.total_tokens
        self.total_cost += usage.total_tokens * price_per_million / 1_000_000
        self.requests += 1

    def report(self):
        return {
            "total_requests": self.requests,
            "total_tokens": self.total_tokens,
            "total_cost": f"${self.total_cost:.4f}",
            "avg_tokens_per_request": self.total_tokens / max(self.requests, 1)
        }

tracker = UsageTracker()

# Track each request
response = client.chat.completions.create(
    model="assisters-chat-v1",
    messages=[{"role": "user", "content": "Hello!"}]
)
tracker.record(response.usage)

print(tracker.report())

JavaScript Example

class UsageTracker {
  constructor() {
    this.totalTokens = 0;
    this.totalCost = 0;
    this.requests = 0;
  }

  record(usage, pricePerMillion = 0.10) {
    this.totalTokens += usage.total_tokens;
    this.totalCost += usage.total_tokens * pricePerMillion / 1_000_000;
    this.requests += 1;
  }

  report() {
    return {
      totalRequests: this.requests,
      totalTokens: this.totalTokens,
      totalCost: `$${this.totalCost.toFixed(4)}`,
      avgTokensPerRequest: this.totalTokens / Math.max(this.requests, 1)
    };
  }
}

const tracker = new UsageTracker();

const response = await client.chat.completions.create({
  model: 'assisters-chat-v1',
  messages: [{ role: 'user', content: 'Hello!' }]
});

tracker.record(response.usage);
console.log(tracker.report());

Rate Limit Headers

Monitor your rate limits in response headers. All API keys have a default limit of 60 RPM.

X-RateLimit-Limit-RPM: 60
X-RateLimit-Remaining-RPM: 55
X-RateLimit-Reset-RPM: 1706745660
HeaderDescription
X-RateLimit-Limit-RPMYour current RPM limit
X-RateLimit-Remaining-RPMRemaining requests this minute
X-RateLimit-Reset-RPMUnix timestamp when the window resets

Usage Alerts

Set up alerts in your dashboard to get notified when:

  • You exceed rate limits frequently
  • Unusual usage patterns are detected
  • Your wallet balance drops below a threshold

Estimating Future Usage

Before Sending Requests

import tiktoken

def estimate_tokens(text):
    encoding = tiktoken.get_encoding("cl100k_base")
    return len(encoding.encode(text))

def estimate_request_cost(messages, max_output_tokens=500, price_per_million=0.10):
    input_tokens = sum(estimate_tokens(m["content"]) for m in messages)
    input_tokens += len(messages) * 4  # Message overhead

    # Estimate total (input + expected output)
    estimated_total = input_tokens + max_output_tokens

    cost = estimated_total * price_per_million / 1_000_000
    return {
        "estimated_input_tokens": input_tokens,
        "max_output_tokens": max_output_tokens,
        "estimated_cost": f"${cost:.6f}"
    }

# Example
messages = [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Explain quantum computing."}
]

print(estimate_request_cost(messages))

Usage by Model

Track usage separately by model for optimization:

from collections import defaultdict

usage_by_model = defaultdict(lambda: {"tokens": 0, "cost": 0, "requests": 0})

MODEL_PRICES = {
    "assisters-chat-v1": 0.10,
    "assisters-vision-v1": 0.05,
    "assisters-embed-v1": 0.01,
}

def track_by_model(model, usage):
    price = MODEL_PRICES.get(model, 0.10)
    usage_by_model[model]["tokens"] += usage.total_tokens
    usage_by_model[model]["cost"] += usage.total_tokens * price / 1_000_000
    usage_by_model[model]["requests"] += 1

# After each request
track_by_model("assisters-chat-v1", response.usage)

# Report
for model, data in usage_by_model.items():
    print(f"{model}: {data['tokens']} tokens, ${data['cost']:.4f}")

Optimizing Usage

Choose Efficient Models

Use the right model for the task

Limit Output Tokens

Set max_tokens to prevent runaway responses

Trim Context

Remove old messages from conversation history

Cache Responses

Cache repeated queries to avoid re-computation

API for Usage Data

Check your usage programmatically:

curl https://api.assisters.dev/v1/usage \
  -H "Authorization: Bearer ask_your_api_key"

Response:

{
  "usage": [
    {
      "period_start": "2026-03-01T00:00:00Z",
      "input_tokens": 1500000,
      "output_tokens": 800000,
      "total_tokens": 2300000,
      "request_count": 4200,
      "cost_usd": 0.31
    }
  ]
}

Dashboard Features

Usage Charts

Visual breakdown of token consumption over time

Export Data

Export usage data as CSV for accounting

Budget Alerts

Get notified before hitting limits

View Your Usage

Check your current usage and costs in real-time