Meta recently announced a new language technology partnership program in collaboration with the United Nations Educational, Scientific and Cultural Organization (UNESCO), aimed at collecting voice recordings and written transcripts in multiple languages to promote the future development of open and accessible artificial intelligence (AI). This initiative particularly focuses on minority languages that are often overlooked in the digital environment.
According to Meta, the program seeks to attract partners to provide over 10 hours of voice recordings along with their transcriptions, a rich collection of written texts, and sets of translated sentences. Meta aims to integrate these languages into its AI speech recognition and translation models through collaborative efforts with partners, with the ultimate results to be released in open source.
Image source note: Image generated by AI, licensed from service provider Midjourney
As of now, confirmed partners include the government of Nunavut in Northern Canada, where some residents speak a language known as Inuktitut. Meta stated in its blog, “Our efforts are particularly focused on underserved languages to support the work of UNESCO. Ultimately, our goal is to create intelligent systems that can understand and respond to complex human needs, regardless of language or cultural background.”
To complement this program, Meta will also release an open-source machine translation benchmark designed to evaluate the performance of language translation models. This benchmark, designed by linguists, supports seven languages and can be accessed and contributed to through the AI development platform Hugging Face.
Meta views these two initiatives as charitable actions, but the company will also benefit from the upgrades to its speech recognition and translation models. Meta continues to expand the number of languages supported by its AI assistant, Meta AI, and is testing features like voice translation in Instagram Reels, allowing creators to dub and auto-sync their audio.
While Meta's efforts in language processing are noteworthy, the company has faced significant criticism regarding its handling of non-English content. Reports indicate that Facebook marked nearly 70% of COVID-19 misinformation in Italian and Spanish content as unflagged, compared to only 29% for English content. Additionally, leaked documents show that Arabic content is often misclassified as hate speech. Meta has stated that it is taking steps to improve its translation and content moderation technologies to address these challenges.