Chat Completions
Generate conversational responses with the chat completions endpoint
Chat Completions
TL;DR
POST to /v1/chat/completions with model, messages array. Supports streaming (stream: true), temperature, max_tokens, tools/function calling. Models: assisters-chat-v1 (general), assisters-code-v1 (code), assisters-vision-v1 (images).
Generate AI responses for conversational applications. This endpoint is fully compatible with the OpenAI Chat Completions API.
Endpoint
POST https://api.assisters.dev/v1/chat/completionsRequest Body
stringrequiredThe model to use for completion. See available models.
Examples: assisters-chat-v1, assisters-vision-v1, assisters-code-v1
arrayrequiredAn array of messages comprising the conversation so far.
Each message object has:
role(string):system,user, orassistantcontent(string): The content of the message
booleandefault: falseIf true, returns a stream of Server-Sent Events (SSE) for real-time responses.
integerMaximum number of tokens to generate. Defaults to model's maximum.
numberdefault: 1.0Sampling temperature between 0 and 2. Higher values make output more random.
numberdefault: 1.0Nucleus sampling parameter. Use this OR temperature, not both.
string | arrayUp to 4 sequences where the API will stop generating tokens.
numberdefault: 0Penalty for new tokens based on whether they appear in the text so far. Range: -2.0 to 2.0.
numberdefault: 0Penalty for new tokens based on their frequency in the text. Range: -2.0 to 2.0.
stringA unique identifier for the end-user, useful for monitoring and abuse detection.
Request Examples
Basic Request
from openai import OpenAI
client = OpenAI(
api_key="ask_your_api_key",
base_url="https://api.assisters.dev/v1"
)
response = client.chat.completions.create(
model="assisters-chat-v1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of Japan?"}
]
)
print(response.choices[0].message.content)import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'ask_your_api_key',
baseURL: 'https://api.assisters.dev/v1'
});
const response = await client.chat.completions.create({
model: 'assisters-chat-v1',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of Japan?' }
]
});
console.log(response.choices[0].message.content);curl https://api.assisters.dev/v1/chat/completions \
-H "Authorization: Bearer ask_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "assisters-chat-v1",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of Japan?"}
]
}'Streaming Request
from openai import OpenAI
client = OpenAI(
api_key="ask_your_api_key",
base_url="https://api.assisters.dev/v1"
)
stream = client.chat.completions.create(
model="assisters-chat-v1",
messages=[
{"role": "user", "content": "Write a short poem about coding"}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'ask_your_api_key',
baseURL: 'https://api.assisters.dev/v1'
});
const stream = await client.chat.completions.create({
model: 'assisters-chat-v1',
messages: [
{ role: 'user', content: 'Write a short poem about coding' }
],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}Multi-turn Conversation
messages = [
{"role": "system", "content": "You are a math tutor."},
{"role": "user", "content": "What is 2 + 2?"},
{"role": "assistant", "content": "2 + 2 equals 4."},
{"role": "user", "content": "And what is that multiplied by 3?"}
]
response = client.chat.completions.create(
model="assisters-chat-v1",
messages=messages
)Response
Non-Streaming Response
{
"id": "chatcmpl-abc123xyz",
"object": "chat.completion",
"created": 1706745600,
"model": "assisters-chat-v1",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of Japan is Tokyo."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}Streaming Response
Each chunk in the stream:
{
"id": "chatcmpl-abc123xyz",
"object": "chat.completion.chunk",
"created": 1706745600,
"model": "assisters-chat-v1",
"choices": [
{
"index": 0,
"delta": {
"content": "The"
},
"finish_reason": null
}
]
}Final chunk:
{
"id": "chatcmpl-abc123xyz",
"object": "chat.completion.chunk",
"created": 1706745600,
"model": "assisters-chat-v1",
"choices": [
{
"index": 0,
"delta": {},
"finish_reason": "stop"
}
]
}Response Fields
idstringUnique identifier for the completion
objectstringAlways chat.completion or chat.completion.chunk for streaming
createdintegerUnix timestamp of when the completion was created
modelstringThe model used for completion
choicesarrayArray of completion choices. Each choice contains:
index: The index of this choicemessage: The generated message (non-streaming)delta: The incremental content (streaming)finish_reason: Why generation stopped (stop,length,content_filter)
usageobjectToken usage statistics (not included in streaming):
prompt_tokens: Tokens in the inputcompletion_tokens: Tokens in the outputtotal_tokens: Total tokens used
Finish Reasons
| Reason | Description |
|---|---|
stop | Natural completion or stop sequence reached |
length | max_tokens limit reached |
content_filter | Content was filtered by moderation |
Error Responses
Best Practices
Use System Messages
Set context and behavior with system messages for consistent responses
Stream Long Responses
Enable streaming for better UX with longer completions
Manage Conversation Length
Trim old messages to stay within token limits
Handle Errors Gracefully
Implement retry logic with exponential backoff