Google is using different AI techniques to bring an improvement in the translation quality

Recently, Google announced some new innovative techniques that it is going to incorporate for translating 108 languages, which will be supported by Google Translate, a service that translates almost 150 billion words daily!

Google Translate already uses the technologies of 'neural machine translation,' 'rewriting based paradigms,' and 'on-device-processing,' and all these technologies made the translations quite accurate. But none of these techs can beat the human performance, and that is a fact.

The combination of all these technologies targeted low and high resource languages, inference speed, and latency. That is the reason why Translate showed great improvement of 5 or more points on all languages and 7 or more for the 50 lowest-resource languages with one year.

These improvements were measured by human evaluations and a metric system based on the similarity between a human reference translation and a system's translation, called BLEU.

Google Translate has also shown improvement in dealing with the machine translation hallucination phenomenon, in which AI models give weird translations when they are given nonsensical inputs.

The first technology that Google is now going to use is a hybrid architecture model. It is a hybrid of a Transformer encoder and a Recurrent Neural Network (RNN) decoder.

In terms of machine translation, an encoder encodes words and phrases as internal representations, and a decoder uses these representations to generate text in a language desired by the user.

If a data input language is natural, then a Transformer does not require to process the beginning of a sentence before it processes the end. And that is perhaps why Transformer-based models are considered more effective than RNNs. But RNN decoder is considered to have a great inference speed than the decoder within the Transformer.

Now, Google Translate has optimized the RNN decoder and coupled it with the Transformer encoder and created this low-latency hybrid model which is exceptional in its functionality!

Google also upgraded its Data Miner. It is now 'embedding-based' for 14 large language pairs instead of being 'dictionary-based'. This means that it uses fewer vectors of numbers to represent words and phrases while focusing more on 'precision,' which is the fraction of relevant data among the retrieved data and focuses less on 'recall,' which is the fraction of the total amount of relevant data that was retrieved.

This increased the number of sentences by 29% of what the miner extracted, and that is a good improvement.

Another technology that Google employed was creating a model for the treatment of noise in training data.

Noise is data which has a huge amount of information that cannot be understood or interpreted correctly. It harms the translation of languages for which translation is readily available and plentiful.

So, to address this noisy data, Google Translate created a system of curriculum learning, in which scores are assigned to examples using models trained on noisy data and tune them on 'clean' data. So, the models begin training on all data, then gradually start training on smaller and cleaner data subsets.

For low-resource languages, Google has deployed a scheme in Google Translate which uses parallel training data, in which each sentence is paired with its translations.

These techniques are especially useful in improving fluency.

Translate is also using the giant M4 Modeling, in which the M4 model translates among English and many languages. This model enables transfer-learning in Google Translate so that the insights obtained from training through high-resource languages which have billions of parallel examples can be transferred and applied to the translation of low-resource languages, which have a few thousands of examples.

Photo: AFP via Getty Images

Hat Tip: Venturebeat.

Read next: Facebook's Artificial Intelligence Experts Claims They Have Developed A Better Way To Create 3D Images
Previous Post Next Post