|Google’s Translatotron converts one spoken language to another, no text involved|
Every day we creep a little closer to Douglas Adams’ famous and prescient Babel fish. A new research project from Google takes spoken sentences in one language and outputs spoken words in another — but unlike most translation techniques, it uses no intermediate text, working solely with the audio. This makes it quick, but more importantly lets it more easily reflect the cadence and tone of the speaker’s voice.
Translatotron, as the project is called, is the culmination of several years of related work, though it’s still very much an experiment. Google’s researchers, and others, have been looking into the possibility of direct speech-to-speech translation for years, but only recently have those efforts borne fruit worth harvesting.
Translating speech is usually done by breaking down the problem into smaller sequential ones: turning the source speech into text (speech-to-text, or STT), turning text in one language into text in another (machine translation), and then turning the resulting text back into speech (text-to-speech, or TTS). This works quite well, really, but it isn’t perfect; each step has types of errors it is prone to, and these can compound one another.
Read More – Tech Crunch