Can the neural network revolution allow machine translation to break the human language barrier?

Although Trump's assumption of office is more often tied to "anti-globalization", considering the "spiraling" evolutionary trajectory of human history, driven by both technological advances and cultural diffusion, it seems that the world's overall trend towards interconnectivity is inevitable! --Especially when globalization meets the Internet, equal and convenient access to information and low-cost effective communication between different countries becomes a necessity. In this sense, one of the greatest enemies of globalization may be the language barriers that have been reinforced by countries over the centuries.

As a cross-discipline, machine translation involves cognitive science, computers, information theory, linguistics and other disciplines, and its theoretical path has also experienced a spiral: from the oldest "translation memo" to the later rule-based, example-based machine translation, and then to the Statistical Translation Model (SMT), which is regarded as an important turning point for machine translation - the latter was the first time scientists realized that eliminating information uncertainty through big data was the way to overcome "intelligence". SMT) - the latter being the first time scientists realized that eliminating information uncertainty through big data was a good way to attack "intelligence".

And in the last two years, machine translation is embracing another, more important technological turning point - neural network-based machine translation (NMT: Neural Machine Translation).

Technical Path to Machine Translation

We share the feeling that both ordinary users and senior translators, no matter using WEB or APP, have obviously noticed a rapid improvement in the quality of translation in recent years.

The question is: why is the change so obvious? It may be useful to look at it in terms of dismantling the technology path.

Intuitively, when humans try to get a machine to translate a language, it is natural to deconstruct the text, like the relationship of concentric circles, where an article is made up of paragraphs, paragraphs are made up of sentences, and sentences are made up of phrases and words, and following the progression from the easy to the hard, the theoretical path of machine translation has been from backward to forward: from the original word-by-word translation to phrase-based translation - -Today, relying on neural networks, sentence-based translation is possible.

So, according to the different translation units, generally speaking, there are two types of machine translation: one is the statistical translation model (SMT) mentioned above, as you know, the widespread popularity of the Internet provides rich training nutrients for statistical translation, and the phrase-based SMT that emerged around the turn of the millennium has greatly improved the quality of machine translation, and has also dominated machine translation for a long period of time, but the disadvantage of using the phrase as the translation unit is that it is very rigid when facing the whole-sentence level. However, the disadvantage of using phrases as the translation unit is that it is very rigid when facing the translation of the whole sentence level.

Another type is of course neural network-based machine translation (NMT), whose translation path is the so-called end-to-end (end-to-end), encoding the source statement as a whole into a vector, and then decoding it through a decoder, which theoretically only needs to be given the source language sentence, and then the neural network can output the target language translation. As an example, if you type "Turnip greens his taste" into Baidu Translate, it can easily output the correct translation of "Every man has his hobbyhorse", instead of something like "Turnip greens his taste". That's why, in just two years, NMT has surpassed its predecessor, SMT, in several public test sets.

If we want to compare, on the whole, NMT is undoubtedly superior to SMT when the data training is more adequate; in short sentences or when the data volume is relatively small, SMT has an advantage in dealing with fixed collocations and idiomatic expressions. Therefore, the two approaches are not the same, but only categorized and used in different scenarios - you know, the user's translation scenarios are quite variable, which requires an excellent translation system to become a master. Today, Baidu's translation system includes SMT, NMT, and even the more traditional EBMT (instance-based machine translation).

Of course, if we're talking about the future, it's almost certain that the forward march of neural network technology itself will make NMT increasingly mainstream (in fact, it's already mainstream in many systems such as Baidu's Chinese-English-Japanese-Korean system) - at the annual International Conference on Computational Linguistics (ACL) in August this year, offline NMT for mobile was listed as an important future research direction. an important future research direction, i.e., an almost certain footnote for the future of machine translation.

Machine translation on the run

Since the early 1930s, the French scientist Alchuni put forward the idea of using machines to translate to date, even if the definition of artificial intelligence has been several folds, machine translation has long been regarded as one of the "ultimate goal" of artificial intelligence. Great expectations often mean difficult goals, but this still can not stop the attraction of this piece of cake to the world's top technology giants.

As the primary stage of the development of translation technology, if at this time to spell out a ranking or high and low, in fact, does not have much significance, and the competition in the technology sector is only Microsoft, Baidu, Google, the three only, which is more important at a glance. Only, from the "Baidu know more about China" big idea can be seen, Baidu in China and even the Asian market is more aggressive, and the search of the same reason, although no one can beat who died, but the regional advantage has become an indisputable fact.

On December 21, from the viewpoint of Dr. Wu Hua, co-chairman of Baidu Technology Committee and technical head of Natural Language Processing Department at Baidu's Machine Translation Technology Open Day, it can be seen that Baidu has in fact become a cocoon breaker in the field of translation technology, and they formally went online with their neural network-based translation system one year earlier than Google, and also created the world's first Internet-based online NMT system as well as the cell phone's offline NMT system for mobile phones. It is reported that Baidu translation has hundreds of millions of visits per day, supporting 28 languages of mutual translation, and the open-side API interface also has more than 20,000 third-party access.

And just a few days ago, Microsoft released the world's first universal translator, Microsoft officials said it can also realize up to 100 people to talk to real-time translation, and support voice input in 9 languages. The impact of Google's globalization is undoubtedly huge, in the acquisition of technology companies at the same time also vigorously develop regionalization advantages, such as Google's 2014 acquisition of Word Lens is also actively carrying out the work of machine translation, which is the same thing as what Robin Li said: to break all the boundaries with artificial intelligence.

In fact, the status quo of Baidu is not really surprising, considering the position of China's economy in globalization, China's reliance on the act of translation is undoubtedly more urgent in the process of involving more people in the globalization of the social collaboration network. What is more realistic is that among the trillions of web pages in the world, 80% are non-Chinese; the number of Chinese outbound tourists exceeded 120 million last year, and 12 languages are used in the top 20 tourist destination countries and regions, especially the translation between Chinese and English, the two most widely used languages in the world, is in the eyes of many people purely a matter of translation, but it is not a matter of translation. Translation, in the eyes of many, is a pure necessity.

The Future of Machine Translation

Quite simply, translation technology ultimately serves the public, otherwise it is the moon in the mirror and flowers in the water.

Importantly, the technology is also gradually reduced to more specific practical scenarios, Baidu Translator APP through the combination of OCR technology and voice technology, for users to meet a variety of fragmented translation needs, to cite a few examples: when you visit foreign countries, you only need to align the screen of the phone to the foreign language introduction, the OCR translation can be presented to translate the results; in the face of the sky book of the general menu in foreign languages, Baidu Translator can be swiftly menu translation Baidu Translator can quickly translate the menu and display the results on your phone, so you don't have to do what you want when ordering a meal; when you are buying abroad, it also allows you to quickly read and understand the manual; in addition, when you encounter a physical object that you don't recognize, the physical translation can inform you of the name of the object in both English and Chinese, accompanied by accurate pronunciation; and with the combination of voice technology and conversational translation, it can help the user to communicate with foreigners without any barriers - I've even seen such a translation. -I even saw this news: Jingjiang City police in the language barrier, the use of Baidu translation to successfully rescue four Russian crew members ......

While the benefits of technology are reaching everyone who worries about language barriers, at the other end of the spectrum, a segment of the population is inevitably worried about technology. "Some years in the future, we can easily imagine that the language barrier will be completely broken down, and those who do simultaneous translation now may not have a job in the future." At last month's Internet conference in Wuzhen, Robin Li sketched out a future scenario for people.

Although the machine breaks through the limitations of the inherent principles of translation, but it must be recognized that the machine translation and the real meaning of "linguistics" is not very close to the literati aspire to the goal of "faith, elegance and reach" is still far away, which also means that the machine translation is a long way to go. Artificial translation can be a little more relaxed.

The reason is that in the end-to-end translation approach, neural networks are unable to understand their own translated sentences and give a reasonable interpretation of the translated text - this is precisely the most essential difference between them and professional human translators. For example, following the back-to-front (from easy to difficult) theoretical path mentioned above, it would be great to let machines understand translations based on "paragraphs" or even "chapters", which requires them to take a big leap forward in terms of contextual understanding and coherence.

So the question is: will it happen? As a technological optimist, my personal answer is of course yes, and it may all just be a matter of time.

In the olden days, language was born for the original purpose of improving internal communication among the people and creating a natural barrier between them and outsiders. If you believe that the development of technology is embedded in the great wave of globalization, it is worthwhile to hope that technology will end millions of years of human language incompatibility. After all, it's an all too old aspiration to make people understand each other.

Related News

TOP