Live voice talent or multilingual text-to-speech voiceovers: which do you need?

Live voice talent or multilingual text-to-speech voiceovers: which do you need?

When your language partner translates your multimedia content, they should also provide voiceover services. Should you choose voice tracks created by human voice talents or multilingual text-to-speech voiceovers (TTS)?  A computer-generated voiceover costs about a third as much a human voiceover, but it’s better suited for some projects than for others.

Whether or not to use multilingual text-to-speech voiceovers depends on several factors, including:

  1. Which languages do you need?
  2. What is the nature of the content?
  3. What is your budget and timeline?
  4. Do you expect frequent updates?

Each of these variables will impact your choice.


Currently, many different languages are fairly well-served by TTS technology. In addition, some regional accents can be rendered, for example American vs. British English and European vs. Latin American Spanish. However, if you need less commonly used languages or some specific regional accents, TTS won’t work.


Two content-related issues will guide your voiceover tactics. First, does the content contain a lot of specialized terms with non-standard pronunciations?  Second, what kind of impact should the content make on your audience?

Pronunciation guides

We always recommend that a client provide instructions for pronouncing important terms. A pronunciation guide ensures that the voiceover artist pronounces brands, names, and technical terms properly in a foreign language.  

We also use pronunciation guides to customize pronunciations for a TTS voiceover. Prior to generating a voiceover, a linguist uses Speech Synthesis Markup Language (SSML) to annotate the text and ensure use of the correct phonemes. SSML can also insert pauses or inflections to mimic natural speech. However, if the text requires a lot of mark-up, the cost for TTS increases.

Emotional impact

Hearing is the most emotionally resonant of the five senses. This is why technical content is better suited for TTS solutions than non-technical content. If you need to persuade, reassure, or otherwise move the audience emotionally, a human voice track will better suit your needs. This applies to advertising and marketing, employee communications, and “soft skills” training. SSML markup approximate normal speech rhythms, but the effort required to mark up a text to sound truly “natural” would exceed the efforts required to record a live voice talent.


Multilingual text-to-speech voiceovers require less turnaround time than human voiceovers, especially when markup is minimal.

Regardless, your language partner should already maintain relationships with voice talents and studios around the world. Many freelance voice talents use home studios to create high quality voiceovers. If you need the “human” touch, you could still meet your deadline.


Some technical instruction may require relatively frequent updates. Using TTS, you can get the exact same voice when you need it. Voiceovers can be re-created and replaced seamlessly, without additional engineering to create the same audio quality as the original.

However, professional voiceover artists understand the need for “pickup” recordings.  For example, if a relatively small portion of a voiceover needs to be changed, such as a date or a name, it can be recorded separately and dropped in by an audio engineer.

In conclusion

If you need to meet a tight deadline and you don’t require an emotional reaction from your audience, multilingual text-to-speech voiceovers provide a cost-effective option for many foreign languages. But if you need to appeal to your audience’s emotions in a rare language or specific regional accent, choose a human voiceover. Either way, your language partner should offer the most appropriate and cost-effective multilingual voiceover solutions.