Usage Tracking
Monitor your API usage and costs in real-time
Usage Tracking
Monitor your token consumption, track costs, and optimize your API usage.
Dashboard Overview
View your usage at assisters.dev/dashboard/usage:
- Usage by Model: Breakdown by model type
- Daily Trends: Usage patterns over time
- Top Endpoints: Most used API endpoints
- Cost Breakdown: Detailed billing information
Usage in API Responses
Every API response includes usage information:
{
"usage": {
"prompt_tokens": 25,
"completion_tokens": 100,
"total_tokens": 125
}
}| Field | Description |
|---|---|
prompt_tokens | Tokens in your input |
completion_tokens | Tokens in the response |
total_tokens | Total tokens (billed) |
Tracking Usage in Code
Python Example
from openai import OpenAI
client = OpenAI(api_key="ask_...", base_url="https://api.assisters.dev/v1")
class UsageTracker:
def __init__(self):
self.total_tokens = 0
self.total_cost = 0
self.requests = 0
def record(self, usage, price_per_million=0.10):
self.total_tokens += usage.total_tokens
self.total_cost += usage.total_tokens * price_per_million / 1_000_000
self.requests += 1
def report(self):
return {
"total_requests": self.requests,
"total_tokens": self.total_tokens,
"total_cost": f"${self.total_cost:.4f}",
"avg_tokens_per_request": self.total_tokens / max(self.requests, 1)
}
tracker = UsageTracker()
# Track each request
response = client.chat.completions.create(
model="assisters-chat-v1",
messages=[{"role": "user", "content": "Hello!"}]
)
tracker.record(response.usage)
print(tracker.report())JavaScript Example
class UsageTracker {
constructor() {
this.totalTokens = 0;
this.totalCost = 0;
this.requests = 0;
}
record(usage, pricePerMillion = 0.10) {
this.totalTokens += usage.total_tokens;
this.totalCost += usage.total_tokens * pricePerMillion / 1_000_000;
this.requests += 1;
}
report() {
return {
totalRequests: this.requests,
totalTokens: this.totalTokens,
totalCost: `$${this.totalCost.toFixed(4)}`,
avgTokensPerRequest: this.totalTokens / Math.max(this.requests, 1)
};
}
}
const tracker = new UsageTracker();
const response = await client.chat.completions.create({
model: 'assisters-chat-v1',
messages: [{ role: 'user', content: 'Hello!' }]
});
tracker.record(response.usage);
console.log(tracker.report());Rate Limit Headers
Monitor your rate limits in response headers. All API keys have a default limit of 60 RPM.
X-RateLimit-Limit-RPM: 60
X-RateLimit-Remaining-RPM: 55
X-RateLimit-Reset-RPM: 1706745660| Header | Description |
|---|---|
X-RateLimit-Limit-RPM | Your current RPM limit |
X-RateLimit-Remaining-RPM | Remaining requests this minute |
X-RateLimit-Reset-RPM | Unix timestamp when the window resets |
Usage Alerts
Set up alerts in your dashboard to get notified when:
- You exceed rate limits frequently
- Unusual usage patterns are detected
- Your wallet balance drops below a threshold
Estimating Future Usage
Before Sending Requests
import tiktoken
def estimate_tokens(text):
encoding = tiktoken.get_encoding("cl100k_base")
return len(encoding.encode(text))
def estimate_request_cost(messages, max_output_tokens=500, price_per_million=0.10):
input_tokens = sum(estimate_tokens(m["content"]) for m in messages)
input_tokens += len(messages) * 4 # Message overhead
# Estimate total (input + expected output)
estimated_total = input_tokens + max_output_tokens
cost = estimated_total * price_per_million / 1_000_000
return {
"estimated_input_tokens": input_tokens,
"max_output_tokens": max_output_tokens,
"estimated_cost": f"${cost:.6f}"
}
# Example
messages = [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Explain quantum computing."}
]
print(estimate_request_cost(messages))Usage by Model
Track usage separately by model for optimization:
from collections import defaultdict
usage_by_model = defaultdict(lambda: {"tokens": 0, "cost": 0, "requests": 0})
MODEL_PRICES = {
"assisters-chat-v1": 0.10,
"assisters-vision-v1": 0.05,
"assisters-embed-v1": 0.01,
}
def track_by_model(model, usage):
price = MODEL_PRICES.get(model, 0.10)
usage_by_model[model]["tokens"] += usage.total_tokens
usage_by_model[model]["cost"] += usage.total_tokens * price / 1_000_000
usage_by_model[model]["requests"] += 1
# After each request
track_by_model("assisters-chat-v1", response.usage)
# Report
for model, data in usage_by_model.items():
print(f"{model}: {data['tokens']} tokens, ${data['cost']:.4f}")Optimizing Usage
Choose Efficient Models
Use the right model for the task
Limit Output Tokens
Set max_tokens to prevent runaway responses
Trim Context
Remove old messages from conversation history
Cache Responses
Cache repeated queries to avoid re-computation
API for Usage Data
Check your usage programmatically:
curl https://api.assisters.dev/v1/usage \
-H "Authorization: Bearer ask_your_api_key"Response:
{
"usage": [
{
"period_start": "2026-03-01T00:00:00Z",
"input_tokens": 1500000,
"output_tokens": 800000,
"total_tokens": 2300000,
"request_count": 4200,
"cost_usd": 0.31
}
]
}Dashboard Features
Usage Charts
Visual breakdown of token consumption over time
Export Data
Export usage data as CSV for accounting
Budget Alerts
Get notified before hitting limits
View Your Usage
Check your current usage and costs in real-time