Assisters API
API Reference

Moderation

Detect harmful, inappropriate, or policy-violating content

Content Moderation

Automatically detect and filter harmful, inappropriate, or policy-violating content. Use this endpoint to protect your users and maintain community standards.

Endpoint

POST https://api.assisters.dev/v1/moderate
POST https://api.assisters.dev/v1/moderations
Both paths are equivalent. /v1/moderations is the OpenAI-compatible alias.

Request Body

stringdefault: assisters-moderation-v1

The moderation model to use. See available models.

Example: assisters-moderation-v1

string | arrayrequired

The text to moderate. Can be a single string or an array of up to 100 strings.

Request Examples

Single Text

from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

response = client.moderations.create(
    model="assisters-moderation-v1",
    input="Hello, how are you today?"
)

result = response.results[0]
print(f"Flagged: {result.flagged}")
print(f"Categories: {result.categories}")
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'ask_your_api_key',
  baseURL: 'https://api.assisters.dev/v1'
});

const response = await client.moderations.create({
  model: 'assisters-moderation-v1',
  input: 'Hello, how are you today?'
});

const result = response.results[0];
console.log(`Flagged: ${result.flagged}`);
console.log(`Categories:`, result.categories);
curl https://api.assisters.dev/v1/moderate \
  -H "Authorization: Bearer ask_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "assisters-moderation-v1",
    "input": "Hello, how are you today?"
  }'

Batch Moderation

response = client.moderations.create(
    model="assisters-moderation-v1",
    input=[
        "First message to check",
        "Second message to check",
        "Third message to check"
    ]
)

for i, result in enumerate(response.results):
    print(f"Message {i}: Flagged={result.flagged}")

Pre-moderation Pattern

def moderate_before_response(user_message):
    """Check user input before processing"""
    moderation = client.moderations.create(
        model="assisters-moderation-v1",
        input=user_message
    )

    if moderation.results[0].flagged:
        return {
            "error": "Your message violates our content policy",
            "categories": moderation.results[0].categories
        }

    # Process the message normally
    response = client.chat.completions.create(
        model="assisters-chat-v1",
        messages=[{"role": "user", "content": user_message}]
    )

    return {"response": response.choices[0].message.content}

Response

{
  "id": "modr-abc123xyz",
  "model": "assisters-moderation-v1",
  "results": [
    {
      "flagged": false,
      "categories": {
        "hate": false,
        "hate/threatening": false,
        "harassment": false,
        "harassment/threatening": false,
        "self-harm": false,
        "self-harm/intent": false,
        "self-harm/instructions": false,
        "sexual": false,
        "sexual/minors": false,
        "violence": false,
        "violence/graphic": false
      },
      "category_scores": {
        "hate": 0.00012,
        "hate/threatening": 0.00001,
        "harassment": 0.00034,
        "harassment/threatening": 0.00002,
        "self-harm": 0.00001,
        "self-harm/intent": 0.00001,
        "self-harm/instructions": 0.00001,
        "sexual": 0.00015,
        "sexual/minors": 0.00001,
        "violence": 0.00023,
        "violence/graphic": 0.00002
      }
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  },
  "cost": {
    "total_usd": 0.0000016,
    "price_per_million_tokens": 0.2
  }
}

Response Fields

idstring

Unique identifier for the moderation request

modelstring

The model used for moderation

resultsarray

Array of moderation results, one per input:

  • flagged: Boolean indicating if content violates policy
  • categories: Object with boolean for each category
  • category_scores: Object with confidence scores (0-1) for each category
usageobject

Token usage for billing

costobject

Cost breakdown: total_usd and price_per_million_tokens

Categories

CategoryDescription
hateContent expressing hatred toward a group
hate/threateningHateful content with threats of violence
harassmentContent meant to harass or bully
harassment/threateningHarassment with threats
self-harmContent promoting self-harm
self-harm/intentExpression of self-harm intent
self-harm/instructionsInstructions for self-harm
sexualSexually explicit content
sexual/minorsSexual content involving minors
violenceContent depicting violence
violence/graphicGraphic depictions of violence

Available Models

ModelDescriptionPrice
assisters-moderation-v1Advanced safety model with 14 categories$0.05/M tokens

Model Details

See detailed model specifications

Use Cases

Best Practices

Moderate Both Inputs and Outputs

Check user messages AND AI responses for safety

Use Custom Thresholds

Adjust category_scores thresholds based on your use case

Batch for Efficiency

Send multiple texts in one request when possible

Cache Results

Cache moderation results for repeated content

Error Responses