Your generated audio will appear here
Features
What ElevenLabs Speech-to-Text API offers
Use cases
Built for
Meeting notes from recorded calls with who spoke when diarization is on
Podcast transcripts with word level timing for scroll highlighting
Compliance review when PHI entity_detection should flag sensitive spans
Legal and medical workflows when you enable entity_detection carefully
Caption exports for short form video using word timestamps
Research corpora built from field recordings with domain keyterms
FAQ
About ElevenLabs Speech-to-Text API
You upload audio through Unifically. The job returns text with optional diarization, timestamps, audio event tags, entity spans, and boosted vocabulary.
When tag_audio_events is enabled, non speech sounds can appear as inline cues such as (laughter) or (music) according to the provider behavior.
The UI accepts comma separated terms. The payload sends up to one hundred trimmed strings to bias recognition for product names and technical words.
Unifically lists ElevenLabs Speech to Text at $0.001056 per second of audio processed.