Assisters API
Models

Moderation Models

Content safety models for detecting harmful content

Moderation Models

Protect your users and platform with AI-powered content moderation. Assisters Moderation detects harmful, inappropriate, or policy-violating content with high accuracy.

Assisters Moderation v1

Model IDstring

assisters-moderation-v1

Our advanced content moderation model trained to detect 14 categories of harmful content with industry-leading accuracy.

SpecificationValue
Model IDassisters-moderation-v1
Categories14 safety categories
Max Tokens8,192
Input Price$0.05 / million tokens
Latency~100ms

Capabilities

  • Comprehensive Detection: 14 harm categories covering all major safety concerns
  • High Accuracy: State-of-the-art precision and recall
  • Low Latency: Fast enough for real-time content filtering
  • Detailed Scores: Get probability scores for fine-grained control
  • Batch Support: Moderate multiple items in a single request

Example Usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.assisters.dev/v1",
    api_key="your-api-key"
)

response = client.moderations.create(
    model="assisters-moderation-v1",
    input="Content to moderate"
)

if response.results[0].flagged:
    print("Content violates policy")
else:
    print("Content is safe")

With Category Scores

response = client.moderations.create(
    model="assisters-moderation-v1",
    input="Content to analyze"
)

result = response.results[0]

# Check specific category scores
if result.category_scores.violence > 0.5:
    print("Violence detected")
if result.category_scores.hate > 0.5:
    print("Hate speech detected")

Safety Categories

Assisters Moderation v1 detects these content categories:

CategoryDescription
hateContent expressing hatred toward protected groups
hate/threateningHateful content with threats of violence
harassmentContent meant to harass, bully, or intimidate
harassment/threateningHarassment with explicit threats
self-harmContent promoting or glorifying self-harm
self-harm/intentExpression of intent to self-harm
self-harm/instructionsInstructions for self-harm
sexualSexually explicit content
sexual/minorsSexual content involving minors
violenceContent depicting violence
violence/graphicGraphic depictions of violence
illegalContent promoting illegal activities
piiPersonal identifiable information exposure
prompt-injectionAttempts to manipulate AI systems

Response Format

{
  "id": "modr-abc123",
  "model": "assisters-moderation-v1",
  "results": [
    {
      "flagged": false,
      "categories": {
        "hate": false,
        "harassment": false,
        "self-harm": false,
        "sexual": false,
        "violence": false
      },
      "category_scores": {
        "hate": 0.0001,
        "harassment": 0.0023,
        "self-harm": 0.0001,
        "sexual": 0.0012,
        "violence": 0.0008
      }
    }
  ]
}

Use Cases

Best Practices

Moderate Both Directions

Check both user inputs AND AI outputs for comprehensive safety

Use Custom Thresholds

Adjust category_scores based on your platform's needs

Log for Review

Keep logs of flagged content for human review and model improvement

Graceful Degradation

Have fallback behavior when moderation service is unavailable

Performance Considerations

ScenarioRecommendation
Real-time chatUse with streaming for immediate feedback
User-generated contentModerate before publishing
High-volume batchesUse batch requests (up to 32 items)
Regulatory complianceLog all moderation decisions