Speaking different languages can be an insurmountable barrier to communication. The directors of Meta are determined to facilitate as much as possible connections between people from different countries and/or cultures. Both to increase interactions on the company’s social networks and so that in the future make the metaverse more attractive. Meta researchers have been working for years on sophisticated artificial intelligence (AI) models capable of translating multiple languages. Today they have presented NLLB-200, a pioneering system capable of translating 200 languages in real time, twice as many as those computed in the best system that Meta had up to now.
“The AI modeling techniques we’ve used are helping to deliver high-quality translations,” Meta founder and CEO Mark Zuckerberg stresses in a post posted today. on your Facebook account. “To give an idea of the scale of the program, the 200-language model analyzes more than 50 billion parameters. We have trained it using the Research SuperCluster, one of the supercomputers fastest in the world.” The NLLB-200 system, an acronym for No Language Left Behind (No language is left behind), is prepared to perform 25,000 daily translations in all Meta apps, according to the young tycoon.
The tool is capable of translating both oral and written language. From the company they present it as a model aimed at the 4,000 million people who speak languages that are not prevalent on the Internet (on the Internet, English rules and Mandarin, Spanish, Portuguese or Arabic are widely used). Among the 200 operating languages, 55 African languages have been included, many of which were not available until now in any automatic translator.
The company’s intention is that in the future Meta’s augmented reality glasses will be able to translate in real time and serve subtitles that are visible only to those who wear the glasses. Google also works in that lineas revealed in May when it presented a similar prototype of glasses.
The model on which NLLB-200 is based draws from M2M-100, presented in 2020 and which presented a fundamental improvement: translations are made directly from the source language to the target language, without going through English. As the latter is the most common on the Internet, it is also the one that feeds most of the world’s databases with which natural language processing systems are trained. Hence, translators would convert any language first to English and then translate it into another, which causes a great loss of nuance and meaning.
To make that leap requires millions of meticulously matched sentences between different language combinations. The problem is that there are underrepresented languages on the internet. Meta gives the example of Swedish and Lingala, a language spoken in the Democratic Republic of the Congo, the Republic of the Congo, the Central African Republic and South Sudan. The European language, used by 10 million Swedes and Finns, has some 2.5 million articles on Wikipedia; the African, practiced by 45 million people, only has 3,260.
To solve this problem, the Meta researchers have perfected a model capable of extracting great performance from each analyzed sentence, while increasing the size of the databases that feed the algorithm.
The company has decided to open source the NLLB-200 model and its model training code in order to help other researchers improve their translation tools and develop new technologies.