Token Counting
Understanding tokens and how billing works
Token Counting
Tokens are the basic unit of text processing and billing in the Assisters API. Understanding tokens helps you optimize costs and manage rate limits.
What Are Tokens?
Tokens are pieces of words used by AI models. A rough rule of thumb:
- 1 token ≈ 4 characters in English
- 1 token ≈ 0.75 words in English
- 100 tokens ≈ 75 words
The exact tokenization depends on the model. Different models may tokenize the same text differently.
Examples
| Text | Tokens |
|---|---|
| "Hello" | 1 |
| "Hello, world!" | 4 |
| "The quick brown fox" | 4 |
| "Artificial Intelligence" | 2-3 |
| "こんにちは" (Japanese) | 3-5 |
Counting Tokens
In API Responses
Every response includes token usage:
{
"usage": {
"prompt_tokens": 25,
"completion_tokens": 100,
"total_tokens": 125
}
}| Field | Description |
|---|---|
prompt_tokens | Tokens in your input (messages) |
completion_tokens | Tokens in the model's response |
total_tokens | Total tokens billed |
Before Sending (Estimation)
Use tiktoken to estimate tokens before making requests:
import tiktoken
def count_tokens(text, model="assisters-chat-v1"):
# Use cl100k_base encoding for Assisters models
encoding = tiktoken.get_encoding("cl100k_base")
return len(encoding.encode(text))
# Example
text = "What is the meaning of life?"
tokens = count_tokens(text)
print(f"Estimated tokens: {tokens}") # ~7 tokensCounting Message Tokens
Messages have overhead beyond just the content:
def count_message_tokens(messages):
encoding = tiktoken.get_encoding("cl100k_base")
total = 0
for message in messages:
# Message overhead (role, formatting)
total += 4
# Content tokens
total += len(encoding.encode(message.get("content", "")))
# Name (if present)
if "name" in message:
total += len(encoding.encode(message["name"]))
total += 1 # Name field overhead
# Conversation overhead
total += 2
return total
# Example
messages = [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"}
]
print(f"Total: {count_message_tokens(messages)} tokens")Billing Calculation
You're billed for total tokens (input + output):
Cost = (prompt_tokens + completion_tokens) × price_per_million / 1,000,000Example Calculation
# assisters-chat-v1 pricing: $0.10 per million tokens
prompt_tokens = 500
completion_tokens = 1000
total_tokens = 1500
price_per_million = 0.10
cost = total_tokens * price_per_million / 1_000_000
print(f"Cost: ${cost:.6f}") # $0.000150Monthly Usage Estimation
def estimate_monthly_cost(requests_per_day, avg_tokens_per_request, price_per_million):
daily_tokens = requests_per_day * avg_tokens_per_request
monthly_tokens = daily_tokens * 30
monthly_cost = monthly_tokens * price_per_million / 1_000_000
return {
"daily_tokens": daily_tokens,
"monthly_tokens": monthly_tokens,
"monthly_cost": monthly_cost
}
# Example: 1000 requests/day, 500 tokens each, $0.10/M
estimate = estimate_monthly_cost(1000, 500, 0.10)
print(f"Monthly cost: ${estimate['monthly_cost']:.2f}") # $1.50Token Limits
Context Window
Each model has a maximum context window:
| Model | Context Window |
|---|---|
assisters-chat-v1 | 128,000 tokens |
assisters-vision-v1 | 128,000 tokens |
assisters-code-v1 | 128,000 tokens |
If your input exceeds the context window, the request will fail with a context_length_exceeded error.
Output Limits
You can limit output tokens with max_tokens:
response = client.chat.completions.create(
model="assisters-chat-v1",
messages=[{"role": "user", "content": "Write an essay"}],
max_tokens=500 # Limit response length
)Optimizing Token Usage
1. Trim Conversation History
Keep only recent messages:
def trim_messages(messages, max_tokens=4000):
encoding = tiktoken.get_encoding("cl100k_base")
# Always keep system message
result = []
if messages and messages[0]["role"] == "system":
result.append(messages[0])
messages = messages[1:]
current_tokens = count_message_tokens(result)
# Add messages from most recent, stop when limit reached
for msg in reversed(messages):
msg_tokens = len(encoding.encode(msg["content"])) + 4
if current_tokens + msg_tokens > max_tokens:
break
result.insert(len([m for m in result if m["role"] == "system"]), msg)
current_tokens += msg_tokens
return result2. Summarize Long Contexts
def summarize_if_long(text, max_tokens=2000):
tokens = count_tokens(text)
if tokens <= max_tokens:
return text
# Summarize with AI
response = client.chat.completions.create(
model="assisters-chat-v1",
messages=[
{"role": "system", "content": "Summarize concisely:"},
{"role": "user", "content": text}
],
max_tokens=max_tokens // 2
)
return response.choices[0].message.content3. Use Efficient Prompts
# Verbose (more tokens)
messages = [{
"role": "user",
"content": "I would like you to please provide me with a comprehensive and detailed explanation of what machine learning is and how it works in general terms."
}]
# Concise (fewer tokens)
messages = [{
"role": "user",
"content": "Explain machine learning briefly."
}]4. Cache Responses
Don't re-request the same information:
from functools import lru_cache
import hashlib
@lru_cache(maxsize=1000)
def cached_completion(prompt_hash):
# Implementation
pass
def get_completion(messages):
# Create hash of messages for cache key
msg_str = str(messages)
prompt_hash = hashlib.md5(msg_str.encode()).hexdigest()
return cached_completion(prompt_hash)Tracking Usage
Per-Request Tracking
class UsageTracker:
def __init__(self):
self.total_tokens = 0
self.total_cost = 0
self.request_count = 0
def record(self, usage, price_per_million):
tokens = usage.total_tokens
cost = tokens * price_per_million / 1_000_000
self.total_tokens += tokens
self.total_cost += cost
self.request_count += 1
def report(self):
return {
"requests": self.request_count,
"tokens": self.total_tokens,
"cost": f"${self.total_cost:.4f}"
}
# Usage
tracker = UsageTracker()
response = client.chat.completions.create(...)
tracker.record(response.usage, price_per_million=0.10)
print(tracker.report())Dashboard Monitoring
Check your usage in real-time at assisters.dev/dashboard/usage.
Token Pricing Reference
| Model | Price per Million Tokens |
|---|---|
assisters-chat-v1 | $0.10 (input) / $0.20 (output) |
assisters-vision-v1 | $0.05 (input) / $0.10 (output) |
assisters-code-v1 | $0.10 (input) / $0.20 (output) |
assisters-embed-v1 | $0.01 |
assisters-moderation-v1 | $0.05 |
assisters-rerank-v1 | $0.05 |
Full Pricing Details
See complete pricing for all models and tiers