Audio quality is the single biggest factor affecting transcription accuracy and dubbing quality. Here’s how to get the best results.
Transcription tips
Dubbing tips
Sync issues
Getting accurate transcriptions
Do:
- Use clean audio with minimal background noise
- Record with a good microphone (lapel or directional)
- Speak clearly at a moderate pace
- Select the correct source language (or use auto-detect for common languages)
Avoid:
- Heavy background music during speech
- Multiple speakers talking simultaneously
- Echo-heavy rooms or outdoor environments with wind
- Very fast speech or heavy accents without post-editing
Speaker diarization
Neolli automatically identifies different speakers in your video. For best results:
- Speakers should have distinct voices
- Minimize crosstalk (speakers talking over each other)
- Longer speaking turns produce more reliable speaker identification
If speaker labels are wrong, you can reassign them in the caption editor using the T shortcut.Getting natural-sounding dubs
Voice profile requirements:
- At least 30 seconds of clear speech from the target speaker
- Minimal background music or effects during speech
- Single speaker produces significantly better results than multi-speaker videos
What degrades voice cloning:
- Heavy background music blended with speech
- Echo or reverb in the recording
- Very short videos (under 30 seconds) — insufficient reference audio
- Multiple speakers in the same video — produces a blended voice
Improving results:
- Edit translated captions to be shorter or simpler — shorter phrases synthesize more naturally
- The dubbing system calibrates audio duration automatically to match segment timing
- If a specific segment sounds unnatural, edit that caption text and regenerate
Fixing caption sync
If captions appear early or late relative to the audio:
- Open the caption editor
- Select the affected segment
- Use to snap the start time to the playhead, or for the end time
- Use Alt + ← / → to slide the entire segment timing
For systemic offset (all captions off by the same amount), select multiple segments and adjust them together.Hold timing shortcuts to accelerate — the editor ramps up to 10× speed the longer you hold the key.
Videos with primarily music content (no speech) will fail dubbing with a music_detected error. This is expected — voice dubbing requires spoken content to clone and re-synthesize. Credits are automatically refunded.