Microsoft has demonstrated software that can translate spoken Mandarin into English in real-time, and which even preserves the speaker’s own voice.
It works by recognizing the words, translating them and reordering them into coherent sentences, and then using speech synthesis software that has been trained to mimic the speaker’s voice.
“It required a text to speech system that Microsoft researchers built using a few hours speech of a native Chinese speaker and properties of my own voice taken from about one hour of pre-recorded (English) data, in this case recordings of previous speeches I’d made,” says Microsoft’s chief research officer Rick Rashid.
Rashid says the system has an error rate of about one word in seven or eight – not great, but a good 30 percent better than previous attempts.
” While still far from perfect, this is the most dramatic change in accuracy since the introduction of hidden Markov modeling in 1979, and as we add more data to the training we believe that we will get even better results,” says Rashid.
Rashid was able to use the technology to make a presentation in China late last month.
“Though it was a limited test, the effect was dramatic, and the audience came alive in response,” he says. “When I spoke in English, the system automatically combined all the underlying technologies to deliver a robust speech to speech experience — my voice speaking Chinese.”
Google’s been working on an automatic translator for years, as have AT&T and nummerous universities. These systems, though, give their translation in a ‘machine voice’, rather than the user’s own, making for a rather awkward experience.
And this prototype is only the beginning, says Rashid: “we may not have to wait until the 22nd century for a usable equivalent of Star Trek’s universal translator, and we can also hope that as barriers to understanding language are removed, barriers to understanding each other might also be removed,” he says.
Watch it in action, below.