Models
Moderation Models
Content safety models for detecting harmful content
Moderation Models
Protect your users and platform with AI-powered content moderation. Assisters Moderation detects harmful, inappropriate, or policy-violating content with high accuracy.
Assisters Moderation v1
Model IDstringassisters-moderation-v1
Our advanced content moderation model trained to detect 14 categories of harmful content with industry-leading accuracy.
| Specification | Value |
|---|---|
| Model ID | assisters-moderation-v1 |
| Categories | 14 safety categories |
| Max Tokens | 8,192 |
| Input Price | $0.05 / million tokens |
| Latency | ~100ms |
Capabilities
- Comprehensive Detection: 14 harm categories covering all major safety concerns
- High Accuracy: State-of-the-art precision and recall
- Low Latency: Fast enough for real-time content filtering
- Detailed Scores: Get probability scores for fine-grained control
- Batch Support: Moderate multiple items in a single request
Example Usage
from openai import OpenAI
client = OpenAI(
base_url="https://api.assisters.dev/v1",
api_key="your-api-key"
)
response = client.moderations.create(
model="assisters-moderation-v1",
input="Content to moderate"
)
if response.results[0].flagged:
print("Content violates policy")
else:
print("Content is safe")With Category Scores
response = client.moderations.create(
model="assisters-moderation-v1",
input="Content to analyze"
)
result = response.results[0]
# Check specific category scores
if result.category_scores.violence > 0.5:
print("Violence detected")
if result.category_scores.hate > 0.5:
print("Hate speech detected")Safety Categories
Assisters Moderation v1 detects these content categories:
| Category | Description |
|---|---|
hate | Content expressing hatred toward protected groups |
hate/threatening | Hateful content with threats of violence |
harassment | Content meant to harass, bully, or intimidate |
harassment/threatening | Harassment with explicit threats |
self-harm | Content promoting or glorifying self-harm |
self-harm/intent | Expression of intent to self-harm |
self-harm/instructions | Instructions for self-harm |
sexual | Sexually explicit content |
sexual/minors | Sexual content involving minors |
violence | Content depicting violence |
violence/graphic | Graphic depictions of violence |
illegal | Content promoting illegal activities |
pii | Personal identifiable information exposure |
prompt-injection | Attempts to manipulate AI systems |
Response Format
{
"id": "modr-abc123",
"model": "assisters-moderation-v1",
"results": [
{
"flagged": false,
"categories": {
"hate": false,
"harassment": false,
"self-harm": false,
"sexual": false,
"violence": false
},
"category_scores": {
"hate": 0.0001,
"harassment": 0.0023,
"self-harm": 0.0001,
"sexual": 0.0012,
"violence": 0.0008
}
}
]
}Use Cases
Best Practices
Moderate Both Directions
Check both user inputs AND AI outputs for comprehensive safety
Use Custom Thresholds
Adjust category_scores based on your platform's needs
Log for Review
Keep logs of flagged content for human review and model improvement
Graceful Degradation
Have fallback behavior when moderation service is unavailable
Performance Considerations
| Scenario | Recommendation |
|---|---|
| Real-time chat | Use with streaming for immediate feedback |
| User-generated content | Moderate before publishing |
| High-volume batches | Use batch requests (up to 32 items) |
| Regulatory compliance | Log all moderation decisions |