AI Dubbing - Neolli

Overview

Dubbing generates a new audio track for your video in the target language, using a voice that sounds like the original speaker. This goes beyond subtitles — viewers hear the content in their language with the creator’s own voice characteristics.

Supported dubbing languages

Voice cloning is supported for 10 languages:

Flag	Language	Code
🇺🇸	English	`eng`
🇨🇳	Chinese	`zho`
🇰🇷	Korean	`kor`
🇮🇹	Italian	`ita`
🇪🇸	Spanish	`spa`
🇧🇷	Portuguese	`por`
🇩🇪	German	`deu`
🇫🇷	French	`fra`
🇯🇵	Japanese	`jpn`
🇷🇺	Russian	`rus`

For the full capabilities matrix across transcription, translation, and dubbing, see Supported Languages.

How voice cloning works

The voice cloning process works in three steps:

Profile — Creates a voice profile from the original speaker’s audio
Synthesize — Generates speech in the target language using the cloned voice
Calibrate — Adjusts audio duration to match the original segment timing

Individual segment audio is merged into complete audio tracks for each language.

Starting a dubbing job

From Add Languages, select your target languages
Enable the Dubbing toggle alongside (or instead of) captions
Click Start

Dubbing jobs run after translation completes — the translated text is what gets synthesized into speech.

Reviewing dubbed audio

Once complete, open the video workspace and select the dubbed language track. The video player plays the dubbed audio synced with the video so you can review before publishing.

Downloading dubbed audio

Click the download icon on any completed language card to access:

Merged Audio (MP3) — Dubbed speech mixed with the instrumental background track
Dubbed Audio (MP3) — Dubbed speech only, no background audio
Instrumental (WAV) — Background audio only (extracted from original)
Audio Segments (ZIP) — Individual WAV files for each segment

See Exporting for details on all download formats.

Tips for best results

Single clear speaker — Videos with one speaker produce the best cloning results
Minimal background noise — Heavy music or ambient noise degrades voice clone quality
Sufficient reference audio — Videos under 30 seconds may not provide enough audio for accurate cloning
Edit dubbed captions — If synthesis sounds unnatural, try editing the translated caption text to be shorter or simpler — the dubbed audio will regenerate

Music content detection

Neolli automatically analyzes audio to detect whether a video contains primarily music rather than speech. This check runs during the first dubbing attempt for each video. What happens when music is detected:

The dubbing job fails immediately with a music_detected error (credits are automatically refunded)
The video is permanently flagged as music content
All future dubbing requests for that video are blocked — both from the localization modal and per-language “Generate Dubbing” buttons
Transcription, translation, and caption generation remain available

Once a video is flagged as music content, dubbing cannot be enabled for it. If you believe the detection was incorrect, contact support. Other workflows (transcription, translation, captions) are not affected.

Credit cost

Dubbing is charged at 250 credits per 1,000 characters of source text, per language. See Credit Costs for a detailed breakdown.

Documentation Index

​Overview

​Supported dubbing languages

​How voice cloning works

​Starting a dubbing job

​Reviewing dubbed audio

​Downloading dubbed audio

​Tips for best results

​Music content detection

​Credit cost

Overview

Supported dubbing languages

How voice cloning works

Starting a dubbing job

Reviewing dubbed audio

Downloading dubbed audio

Tips for best results

Music content detection

Credit cost