Assisters API
API Reference

Chat Completions

Generate conversational responses with the chat completions endpoint

Chat Completions

TL;DR

POST to /v1/chat/completions with model, messages array. Supports streaming (stream: true), temperature, max_tokens, tools/function calling. Models: assisters-chat-v1 (general), assisters-code-v1 (code), assisters-vision-v1 (images).

Generate AI responses for conversational applications. This endpoint is fully compatible with the OpenAI Chat Completions API.

Endpoint

POST https://api.assisters.dev/v1/chat/completions

Request Body

stringrequired

The model to use for completion. See available models.

Examples: assisters-chat-v1, assisters-vision-v1, assisters-code-v1

arrayrequired

An array of messages comprising the conversation so far.

Each message object has:

  • role (string): system, user, or assistant
  • content (string): The content of the message
booleandefault: false

If true, returns a stream of Server-Sent Events (SSE) for real-time responses.

integer

Maximum number of tokens to generate. Defaults to model's maximum.

numberdefault: 1.0

Sampling temperature between 0 and 2. Higher values make output more random.

numberdefault: 1.0

Nucleus sampling parameter. Use this OR temperature, not both.

string | array

Up to 4 sequences where the API will stop generating tokens.

numberdefault: 0

Penalty for new tokens based on whether they appear in the text so far. Range: -2.0 to 2.0.

numberdefault: 0

Penalty for new tokens based on their frequency in the text. Range: -2.0 to 2.0.

string

A unique identifier for the end-user, useful for monitoring and abuse detection.

Request Examples

Basic Request

from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

response = client.chat.completions.create(
    model="assisters-chat-v1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of Japan?"}
    ]
)

print(response.choices[0].message.content)
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'ask_your_api_key',
  baseURL: 'https://api.assisters.dev/v1'
});

const response = await client.chat.completions.create({
  model: 'assisters-chat-v1',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of Japan?' }
  ]
});

console.log(response.choices[0].message.content);
curl https://api.assisters.dev/v1/chat/completions \
  -H "Authorization: Bearer ask_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "assisters-chat-v1",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of Japan?"}
    ]
  }'

Streaming Request

from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

stream = client.chat.completions.create(
    model="assisters-chat-v1",
    messages=[
        {"role": "user", "content": "Write a short poem about coding"}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'ask_your_api_key',
  baseURL: 'https://api.assisters.dev/v1'
});

const stream = await client.chat.completions.create({
  model: 'assisters-chat-v1',
  messages: [
    { role: 'user', content: 'Write a short poem about coding' }
  ],
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || '';
  process.stdout.write(content);
}

Multi-turn Conversation

messages = [
    {"role": "system", "content": "You are a math tutor."},
    {"role": "user", "content": "What is 2 + 2?"},
    {"role": "assistant", "content": "2 + 2 equals 4."},
    {"role": "user", "content": "And what is that multiplied by 3?"}
]

response = client.chat.completions.create(
    model="assisters-chat-v1",
    messages=messages
)

Response

Non-Streaming Response

{
  "id": "chatcmpl-abc123xyz",
  "object": "chat.completion",
  "created": 1706745600,
  "model": "assisters-chat-v1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Japan is Tokyo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Streaming Response

Each chunk in the stream:

{
  "id": "chatcmpl-abc123xyz",
  "object": "chat.completion.chunk",
  "created": 1706745600,
  "model": "assisters-chat-v1",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "The"
      },
      "finish_reason": null
    }
  ]
}

Final chunk:

{
  "id": "chatcmpl-abc123xyz",
  "object": "chat.completion.chunk",
  "created": 1706745600,
  "model": "assisters-chat-v1",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ]
}

Response Fields

idstring

Unique identifier for the completion

objectstring

Always chat.completion or chat.completion.chunk for streaming

createdinteger

Unix timestamp of when the completion was created

modelstring

The model used for completion

choicesarray

Array of completion choices. Each choice contains:

  • index: The index of this choice
  • message: The generated message (non-streaming)
  • delta: The incremental content (streaming)
  • finish_reason: Why generation stopped (stop, length, content_filter)
usageobject

Token usage statistics (not included in streaming):

  • prompt_tokens: Tokens in the input
  • completion_tokens: Tokens in the output
  • total_tokens: Total tokens used

Finish Reasons

ReasonDescription
stopNatural completion or stop sequence reached
lengthmax_tokens limit reached
content_filterContent was filtered by moderation

Error Responses

Best Practices

Use System Messages

Set context and behavior with system messages for consistent responses

Stream Long Responses

Enable streaming for better UX with longer completions

Manage Conversation Length

Trim old messages to stay within token limits

Handle Errors Gracefully

Implement retry logic with exponential backoff