Using Neural Networks For Multi-Way, Multilingual Translations

Although machines can outperform humans in almost any skill set today, there is still one process that they have yet to master: translation. Several students learning a second or third language in particular will have undoubtedly encountered some of the more hilarious results produced by Google (mis)Translate.

But a fascinating solution was recently proposed by the CDS’s very own Kyunghyun Cho. Together with Yoshua Bengio and Orhan Firat, their innovative model—which is the first to handle multi-way, multilingual translations—clinched the runners-up position for best paper at the 2016 Annual Conference of the North American Chapter of the Association for Computational Linguistics.

Traditionally, machine translation uses a phrase-based approach. It chops up the source sentence into phrases, which are then directly mapped to a corresponding phrase in the target language. But, as Cho remarks, the drawbacks are that this phrase-based mapping is highly specific to a given language pair, and cannot not easily add a third language for translation.

Harnessing the power of neural networks might help us overcome this problem. The inspiration for Cho’s project came from observing how multilingual individuals typically learn new languages more quickly because they capture the underlying structures shared between different languages, and use them to learn a new language. The question, then, was obvious: could machine translations adopt this process by using neural networks?

Instead of the phrase-based model, Cho’s neural network model reads and translates the source sentence into the target language using an encoder-decoder approach. The encoder reads and compresses the sentence into a fixed-size vector. This is then translated into the target language using a decoder. This process increases the translation’s accuracy, which is why Cho’s model outperforms traditional translation software.

This encoder-decoder approach also allows the machine to capture the underlying linguistic structures shared between languages by observing the patterns in the particular points of the vector where the encoder and decoder operate. Crucially, remembering these patterns means that the machine can also translate a third language based on the lessons learned from the first two.

Cho’s approach is also especially compelling because it uses a recurrent neural network. While traditional phrase-based translation only reads the source sentence once before translating it, Cho’s recurrent network repeatedly refers to the source sentence while producing its translation to increase accuracy.

NYU Center for Data Science