Skip to main content
Audio quality is the single biggest factor affecting transcription accuracy and dubbing quality. Here’s how to get the best results.

Getting accurate transcriptions

Do:
  • Use clean audio with minimal background noise
  • Record with a good microphone (lapel or directional)
  • Speak clearly at a moderate pace
  • Select the correct source language (or use auto-detect for common languages)
Avoid:
  • Heavy background music during speech
  • Multiple speakers talking simultaneously
  • Echo-heavy rooms or outdoor environments with wind
  • Very fast speech or heavy accents without post-editing

Speaker diarization

Neolli automatically identifies different speakers in your video. For best results:
  • Speakers should have distinct voices
  • Minimize crosstalk (speakers talking over each other)
  • Longer speaking turns produce more reliable speaker identification
If speaker labels are wrong, you can reassign them in the caption editor using the T shortcut.
Videos with primarily music content (no speech) will fail dubbing with a music_detected error. This is expected — voice dubbing requires spoken content to clone and re-synthesize. Credits are automatically refunded.