Call Center – Real Time solutions / 03

How Qualcomm is Finally Bringing Real-time Translation to the World

Judging by the collective gasp from the crowd in attendance, and the ensuing stares from single-serving neighbors that seemed to silently ask “Did I just see what I think I just saw?” I wasn’t the only one there who realized the importance of what we had just witnessed: A true, real-time, and mostly seamless translation between two people speaking different languages during a phone conversation. The ramifications of such a real-time translation capability were obvious, but the real question was… how in the world is this even possible. Was it a trick? Was it a demo of something Qualcomm was working on and hoping to achieve someday? Or was it a real demo of a real product that actually works? It turned out to be the latter. It was and is entirely real. This is not a tomorrow or next year capability. It is a right now capability, and while partnerships with companies like Youdao are critical to developing certain key aspects of the process, Qualcomm’s AI-powered Snapdragon 865 5G mobile platform is the tech that makes it possible.

Why Qualcomm’s Real-Time Translation Technology Breakthrough is Different — and So Groundbreaking

I already know what you’re thinking: Wasn’t Google already doing this? Well, not exactly. Not like this. While Google and a number of other tech companies with strong AI products have managed to build impressive voice-to-text and voice-to-voice translation engines, this is different for several reasons, the principal one being that Qualcomm’s real-time translation function happens on the device, not in the cloud.

On-device translation is the key to real-time translation. Relying on a cloud solution to translate speech typically involves too much lag for a real-time application. Your speech has to be captured, transmitted to a server somewhere, analyzed and translated, then the translation sent back to your device or to the call.

Don’t get me wrong, cloud-based translation apps are great and useful if all you need to do is ask a voice assistant to translate a phrase or a recorded video, but not so great if you expect that translation to happen in real time during a conversation. For that, you need as near to zero latency as you can achieve — and putting the translation function literally between you and the person that you are communicating with is a very good way to achieve that. Better yet, if the translation function can happen on your phone, or on the device you are using as your communication interface (it could also be a laptop or a tablet), you can keep latency to a minimum. And that is where Qualcomm’s Snapdragon 865 comes in: The 865’s integrated 5th Generation AI Engine packs an impressive 15 TOPS (Trillion Operations Per Second). It is also is backed up by the combined power of the Adreno 650 GPU, Hexagon 698 processor, and Kryo 585 CPU. The Hexagon digital signal processor specifically holds the key to optimizing end-to-end latency to keep each step of the translation process as short as possible.

The process is fairly complex, obviously, and Qualcomm’s real-time translation process can be broken down into three general steps enabled by specific technology solutions.

First, Automatic Speech Recognition (ASR) captures your speech and, using Convolutional Neural networks (CNN), transcribes it as text into the language that you spoke in. This happens on the Hexagon 698 processor.

Then, using Neural Machine Translation (NMT), the English text is translated into your interlocutor’s language, also in text format. Note that this type of translation is contextual, so it is not a word-for-word translation, but rather an approximation of how your sentences would naturally translate into the other language. This is important because languages often have very different grammatical rules, to say nothing of selecting the right words to minimize ambiguity. Those factors are taken into account at this stage.

Finally, a Text-to-speech engine converts the translated text into speech. And because this all happens on the device, and the Snapdragon 865 platform is designed to perform this type of operation in real-time, the translation feels seamless.