Documentation Index
Fetch the complete documentation index at: https://docs.neolli.jocoding.io/llms.txt
Use this file to discover all available pages before exploring further.
Audio quality is the single biggest factor affecting transcription accuracy and dubbing quality. Here’s how to get the best results.
Transcription tips
Dubbing tips
Sync issues
Getting accurate transcriptions
Do:
- Use clean audio with minimal background noise
- Record with a good microphone (lapel or directional)
- Speak clearly at a moderate pace
- Select the correct source language (or use auto-detect for common languages)
Avoid:
- Heavy background music during speech
- Multiple speakers talking simultaneously
- Echo-heavy rooms or outdoor environments with wind
- Very fast speech or heavy accents without post-editing
Speaker diarization
Neolli automatically identifies different speakers in your video. For best results:
- Speakers should have distinct voices
- Minimize crosstalk (speakers talking over each other)
- Longer speaking turns produce more reliable speaker identification
If speaker labels are wrong, you can reassign them in the caption editor using the T shortcut.Getting natural-sounding dubs
Voice profile requirements:
- At least 30 seconds of clear speech from the target speaker
- Minimal background music or effects during speech
- Single speaker produces significantly better results than multi-speaker videos
What degrades voice cloning:
- Heavy background music blended with speech
- Echo or reverb in the recording
- Very short videos (under 30 seconds) — insufficient reference audio
- Multiple speakers in the same video — produces a blended voice
Improving results:
- Edit translated captions to be shorter or simpler — shorter phrases synthesize more naturally
- The dubbing system calibrates audio duration automatically to match segment timing
- If a specific segment sounds unnatural, edit that caption text and regenerate
Fixing caption sync
If captions appear early or late relative to the audio:
- Open the caption editor
- Select the affected segment
- Use to snap the start time to the playhead, or for the end time
- Use Alt + ← / → to slide the entire segment timing
For systemic offset (all captions off by the same amount), select multiple segments and adjust them together.Hold timing shortcuts to accelerate — the editor ramps up to 10× speed the longer you hold the key.
Videos with primarily music content (no speech) will fail dubbing with a music_detected error. This is expected — voice dubbing requires spoken content to clone and re-synthesize. Credits are automatically refunded.