Translatotron by Google is Capable of Translating in Speaker’s Voice

A new translation model, Translatotron, is about to be introduced by Google which will translate from one language to another in the same voice and tempo as of the user.

Typical steps to translate speech first into text and then back to speech might take up time and contain errors. However, with end-to-end technique, speaker’s voice is translated to another language directly. It is expected that direct translation model will open ways for future developments.

A sequence-to-sequence network is used by Translatotron, which accepts voice input and processes it as spectrogram - frequencies shown in visual format. The spectrogram is then generated into other language to which it is translated. Translation takes place at much faster speed, with a little or no chances of messing up anything.

An additional speaker encoder component is also used by tools at times to keep up the voice of speaker. The output of translated speech may sound little robotic and synthesized but still speaker’s voice is maintained to some extent.


Sample results of Translatotron, how it keeps speaker’s voice, can be listened by going to Google Research’s GitHub page. Some of them may not be up to mark but it’s just the beginning.

In last few months, Google has been working to improve its translations. In 2018, it introduced various accents of languages in region-based pronunciations in Google Translate. Also some of the new languages were added to real-time translation feature. Recently, an “interpreter mode” was introduces in Google Assistant as well, for smart displays and speakers.

Google is Introducing Translatotron - An End-to-End Speech-to-Speech Translation Model
Photo: S3studio via Getty Images

Read next: Google's Shopping Cart Feature is Pushing the Boundaries with its upcoming Upgrades by Incorporating Search, Assistant and YouTube
Previous Post Next Post