How to Add AI Dubbing on Kapwing

How to Add AI Dubbing on Kapwing

If you have a global audience, it might be hard to make a new video over and over in different languages. The workaround? Dubbing!

Dubbing is the adding of a different language to a video that has already been shot. A good example is when you watch a foreign film originally spoken in Mandarin, but you watch it in English.

Table of Contents

Classic Dubbing Step by Step
AI Translation First Dubbing Step by Step
Can I dub using a translated SRT file?
How to Dub on Kapwing with a Custom Voice?
How do I fix my translated voice over if it sounds too slow or too fast?
How can I save brand words and translations so that they are always correct?
How do I correct pronunciation mistakes in the dub?
What dubbing features does Kapwing have?
What languages does Kapwing Dubbing support?
FAQs (cost of dubbing, voices we offer, etc)


Classic Dubbing Step by Step

You can do dub videos yourself on Kapwing. Follow these instructions to dub your video automatically:

  1. Upload your video(s) or audio files to the Kapwing Studio
  2. Click "Translate" on the left side of the studio
  3. Click "Dub Video" to start the process of transcription, translation, and synthetic voice dubbing.
  4. Select the assets you want to have dubbed. If you uploaded multiple videos, they will all be "pre-selected" if there is audio detected.
  5. Select your preference(s) for the dubbing, including the original language of the video, the target language you want to dub into, and the voice. By default, Kapwing clones the original voice of the speaker, but you can choose a stock voice for the speaker instead. Under advanced options, you can select if the output must exactly match the duration of the original video or if there is some flexibility to adjust the timing of the output for better syncing.
  6. [Optional] Review transcript before dubbing. This option will let you proofread the transcription and translation side by side before processing the text to speech file. Take this step for the highest quality output.
  7. Click "Dub Video." Kapwing will take a few minutes to dub your video, depending on the video length.
  8. Make any other edits you want before exporting the project. If you make updates to the transcription, translation, speaker assignment, voice options, or other settings, you can click "Re-dub" to regenerate the text to speech layers.
  9. [Optional] Lip Sync: Kapwing's AI-powered lip sync feature will change the lips of the speaker to match the new synthetic text. Find this feature under the "Smart Tools" menu after you generate the dub.

AI Translation First Dubbing Step by Step

Many creators wants to produce two versions of a translated video – one with closed captions and one with dubbed audio - for different markets. To support this work flow, start with a captioning step to translate the subtitles of the video first:

Upload: Start by uploading the video and, in the Subtitles tab, auto-generating subtitles in the original language with no translation.

  1. Transcribe: Review the captions to ensure that the transcription is correct. Make any edits to the text and timings of the subtitles if they're off.
  2. Translate: Use the "Smart Tools" action menu to find "Translate Subtitles." Translate the subtitles into the target language. Review the translation to ensure that the new dialogue makes sense. Kapwing is fully collaborative, so you can send the URL to a global partner for quality control if needed.
  3. Dub: Click "Smart Tools" > "Dub Audio" to create the synthetic voiceover in the translated language. You'll chose the voice and speaker settings before processing the dub.
  4. [Optional] Translate Text: You can also translate the text within the contents of the video or image. This is helpful if your video has text embedded in it, like in a presentation, conference recording, lecture, training film, or Zoom meeting. Select the video and click "Translate Text" in the right side-bar. Kapwing will identify all text layers in the video, translate them, and add matching text overlays with the translation. You can review all added layers and move, edit, or delete them.
  5. Review: Kapwing is a fully-featured video editor, so users can change the speed of the audio, make cuts, add subtitles, and more. If you make changes to the dubbed voiceover, you can re-generate the synethic voice to match the new script.
  6. Export: Click "Export Project" to download and share the processed MP4 video.

Can I dub using a translated SRT file?

Yes! You can upload an SRT file to Kapwing to use as the basis for the dubbed voice. When you've uploaded the video and clicked "Dub Audio," use the "Upload SRT/VTT" button in the target language to import an existing captions file. Kapwing will use this file as the basis for the dubbed video.

You can also download the translated captions as an SRT file. If you want to upload a new version of this SRT file before re-dubbing, you can re-generate a new dub based on the updated file.

How to Dub on Kapwing with a Custom Voice?

Kapwing offers the ability to save a clone of your voice or upload a voice of your choosing allowing you to create a text to speech layer using your own voice model. We've enabled Voice Cloning in partnership with Eleven Labs.

To add a voice clone, you must be a Business customer. Business plan customers can save up to 2 voice clones in their Brand Kit. Once you've upgraded to the Business Plan, click the "Add new Voice" button in the Text to Speech dropdown menu. You'll be prompted to upload an example of the speaker whose voice you want to clone*.

When dubbing, you can select the voice you have uploaded in Brand Kit under the "Voice" dropdown.

To delete a voice clone, go to your Brand Kit and scroll down to the saved voice clones. Hover over a voice model icon and click the delete icon that appears in the upper corner.

*Customers must have the rights to clone a speaker's voice, as noted in Kapwing's terms of service.

How do I fix my translated voice over if it sounds too slow or too fast?

Different languages have different cadences and lengths. “Hi team!” in English might become an 8-syllable phrase in Japanese. We solve this by refining translations to better match the original transcription. Sometimes, the translated text doesn’t match the original timing—it may be too long or too short. In those cases, we adjust the speed of the translated voiceover to make it fit.

In order to get the most natural sounding voiceover, we recommend allowing adjustments to video speed, which is a setting you can turn on within Advanced settings. This setting will enable adjustments to the audio speed and video speed so that the translated voiceover has a more natural pace. If you need your dubbed video's duration to match that of the original, we do not recommend turning this on.

If you listen to your dubbed video and a certain section sounds too slow or too fast, we recommend adjusting the translation to be shorter if the layer sounds too fast or longer if the section sounds too slow. You can adjust the translation within the Translate Tab. Once you have edited your translation, make sure to update the dubbed audio.

You can also adjust the speed of a dubbed section by splitting the voice over layer around that section and adjusting the speed in the right sidebar.

0:00
/0:14

How can I save brand words and translations so that they are always correct in my dub?

Brand Glossary allows users to enhance their subtitling, translation, and dubbing process. This tool allows you to customize how certain words or phrases are translated in your subtitles to ensure accuracy and consistency in your multilingual content. The Brand Glossary is made up of "Custom Spelling", "Translation Rules", and "Pronunciation".

You can access the Brand Glossary through the Brand Kit, Dubbing, Translating, and/or Subtitle settings within Kapwing's Studio.

For "Custom Spelling" you can input a commonly-misspelled word and automatically replace it with the desired spelling so every video you subtitle will automatically use your spelling. These pairings are saved and automatically updated in every new transcription and SRT file. You can review and delete saved pairs in the Custom Spelling section of brand kit or in the subtitles panel under the Dictionary icon.

For "Translation Rules", similar to "Custom Spelling", you can select commonly-mistranslated words and automatically set them to the right or intended translation. You can also set "no-translate" words to keep brand names or specific terminology in their original language.

How do I correct pronunciation mistakes in the dub?

If you dub a video and notice that there is an issue with the pronunciation of one or multiple words, you can use the Pronunciation tab within the Brand Glossary to fix this. Open the Brand Glossary in the top right corner of the Translate Tab.

Select "Pronunciation", and enter the term that you hear is mispronounced in the dub.


Spell out how the word should be pronounced in the "Free form pronunciation" section, or click "Generate suggestion". Once you are happy with the pronunciation rule, click "Save to glossary" to save for future generations.

What dubbing features does Kapwing have?

Kapwing has roots in AI video editing, so the integrated AI technology and customization on tweaking the audio and video layer sets us apart from other dubbing products. Here's a list of features that we support:

Supported Features

✅ Automatic speech recognition (ASR) and captioning
✅ Machine translation, with support from multiple translation vendors
✅ Translation rules that can be added manually or uploaded to leverage an existing translation glossary
✅ Bulk dubbing flow that allows dubbing from one language to multiple
✅ Background sound preservation
✅ Voice cloning
Lip sync
✅ Support for uploaded SRT files and Slavic, Arabic, and RTL languages
✅ Realistic synthetic voices in over 40 languages
✅ Import from YouTube and Google Drive
✅ Speaker labels and dubbing for videos with multiple speakers
✅ Translate text embedded in the video
✅ Advanced timing and speed adjustments technology to match the original timing of the video
✅ Real-time collaboration and team comments
✅ Custom spelling glossary for commonly-used words
✅ Translation rules available across the brand workspace
✅ Regeneration of text to speech layers
✅ Download SRT of translated captions or embed captions into the video
✅ Download dubbed MP3 or Mp4
✅ Video uploads up to 6GB and 2 hrs long
✅ Dialect-specific language selection
✅ Custom pronunciation guide

Here are some features that Kapwing's Dubbing platform does not support:

Not Supported Features

❎ Emotive controls or adjustments. Creators can use punctuation to add inflection and disfluencies to generated TTS, but the same voice clone is used throughout the video, so they do not express a high degree of emotional variance.
❎ Bulk import or export. Each video must be uploaded and exported individually, although it is possible to dub one video into multiple languages
❎ Programatic dubbing API

What languages does Kapwing Dubbing support?

Kapwing uses over 40 different languages for dubbing as it does for text to speech. When users start a new dubbed project, they choose the target language and dialect to translate into. See the full list of supported languages below.

Supported Language List

English (US)
English (UK)
English (AUS)
Arabic (Multi-Region)
Bulgarian
Chinese (Mandarin)
Croatian
Czech
Danish
Dutch
Finnish
Filipino (Tagolog)
French
German
Greek
Gujarati*
Hebrew*
Hindi
Hungarian
Indonesian
Italian
Japanese
Kannada*
Korean
Lithuanian*
Malay
Malayalam*
Norwegian
Polish
Portuguese (Brazil)
Portuguese (Portugal)
Romanian
Russian
Slovak
Spanish (Spain)
Spanish (Mexico)
Swedish
Tamil
Telugu*
Turkish
Ukrainian
Vietnamese

* we do not support voice cloning in this language

FAQ

Is video dubbing free? How much does video dubbing on Kapwing cost?

Video Dubbing on Kapwing is free to try. To try it, create an account on Kapwing and upload a short video. Free users can upload a video less than 8 minutes long to dub a video. Choose a realistic synthetic voice to use as the dubbed audio.

To use Lip Sync, users can upgrade to Kapwing Pro. A Pro subscription includes up to 200 minutes of video dubbing each month, and it also removes the watermark from the exported video. See our pricing page for more info.

To dub a video in the same voice as the original speaker (voice cloning), you'll need to upgrade to Kapwing's Business or Enterprise plan. Both Business and Enterprise plans are billed per-seat, meaning each editor will need a license to access the platform.

How does Voice Dubbing on Kapwing work?

Originally a video and audio editing platform, Kapwing integrates multiple AI technologies to power our video dubbing product.

  • Clean audio and background sounds: Kapwing extracts the spoken words of the video from other sounds, like music, laughter, and effects. This enhances the transcription and makes the dubbed audio sound more natural.
  • Transcription: The dialogue of the video is extracted using speech-to-text technology and the team's glossary.
  • Translation: Captions are translated using machine vendors.
  • Synthetic voice generation: A new voiceover is created in the dubbed langauge. Kapwing leverages premium text-to-speech providers to make the voice sound highly realistic. For Business and Enterprise customers, the voice is cloned from the original speaker to make it as realistic as possible. Kapwing detects where the speaker changes so that it creates different voices for each speaker, improving the quality of the dubbed audio.
  • Timing: The new dubbed audio is combined with the original video and background audio track in the timeline. Kapwing uses generative AI to adjust the timing of the translated audio, making it match the original video as closely as possible.
  • Lip sync: Users can turn on Lip Sync to generate a new video layer where the lips of the speaker match the new dubbed audio.
  • Translate Text: Kapwing uses advanced technology to scan the video, identify embedded text layers, translate them, and overlay matching text layers to ensure that embedded text is also translated to the target language.

The result is a dubbed audio track that sounds closer to the original video than any other platform. Our base editing tools are designed for training and communications teams to collaborate on video content, so it's customizable at every step.

What voices do you offer?

Kapwing offers voices from Google AI, Cartesia, and Eleven Labs. We have more than 20 premium voices available from ElevenLabs and, in our Business and Enterprise plan, support creating a custom voice clone. Please inquire if you have a specific voice that you'll like Kapwing to add or a vendor you're interested in.

What companies use Kapwing?

Our dubbing product launched in 2024 and is used by communications teams at multinational companies like Chevrolet, SHEIN, OEC, Pilatesology, and Hollister plus dozens of Universities, a few churches, and multiple government agencies.


Additional Resources:

How to Add Subtitles or Captions with Kapwing
How to use Text-to-Speech in Kapwing
Video Localization 101
How to Use the Translation Rules Feature

Looking for more help?

Check our Release Notes for tutorials on how to use the latest Kapwing features!