Speech-to-Text Models
Transcribe audio in 100+ languages
Speech-to-Text Models
Transcribe audio files with high accuracy using Assisters Whisper, our advanced speech recognition model.
Assisters Whisper v1
Model IDstringassisters-whisper-v1
Our state-of-the-art speech recognition model that transcribes audio in 100+ languages with exceptional accuracy.
| Specification | Value |
|---|---|
| Model ID | assisters-whisper-v1 |
| Languages | 100+ |
| Max Audio Length | 25 minutes |
| Price | $0.01 / minute |
| Latency | ~1x real-time |
Capabilities
- Multilingual: Transcribe 100+ languages automatically
- High Accuracy: State-of-the-art word error rate
- Speaker Diarization: Identify different speakers (coming soon)
- Timestamps: Word and segment-level timestamps
- Translation: Translate audio to English
Supported Formats
MP3, MP4, M4A, WAV, WEBM, FLAC, OGG, and more.
Example Usage
from openai import OpenAI
client = OpenAI(
base_url="https://api.assisters.dev/v1",
api_key="your-api-key"
)
# Transcribe audio file
with open("audio.mp3", "rb") as audio_file:
response = client.audio.transcriptions.create(
model="assisters-whisper-v1",
file=audio_file
)
print(response.text)With Timestamps
response = client.audio.transcriptions.create(
model="assisters-whisper-v1",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["word", "segment"]
)
# Access timestamps
for segment in response.segments:
print(f"[{segment.start:.2f} - {segment.end:.2f}] {segment.text}")Translation to English
# Translate non-English audio to English
response = client.audio.translations.create(
model="assisters-whisper-v1",
file=audio_file
)
print(response.text) # English translationWith Language Hint
# Specify the language for better accuracy
response = client.audio.transcriptions.create(
model="assisters-whisper-v1",
file=audio_file,
language="es" # Spanish
)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
file | file | required | Audio file to transcribe |
model | string | required | Model ID (assisters-whisper-v1) |
language | string | auto | ISO-639-1 language code |
prompt | string | null | Guide the model's style |
response_format | string | "json" | Output format |
temperature | float | 0 | Sampling temperature |
timestamp_granularities | array | null | Timestamp detail level |
Response Formats
| Format | Description |
|---|---|
json | Simple JSON with text |
text | Plain text only |
srt | SubRip subtitle format |
verbose_json | Detailed JSON with timestamps |
vtt | WebVTT subtitle format |
Use Cases
Supported Languages
Assisters Whisper v1 supports 100+ languages including:
Major Languages: English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese (Simplified & Traditional), Japanese, Korean, Arabic, Hindi, and more.
Regional Languages: Catalan, Welsh, Icelandic, Latvian, Lithuanian, Slovenian, and many others.
Best Practices
Use Language Hints
Specify the language when known for better accuracy
Clean Audio
Higher quality audio produces better transcriptions
Chunk Long Audio
Split files longer than 25 minutes into chunks
Use Prompts
Guide the model with context-specific terminology