Meta’s open-source speech AI models support over 1,100 languages - Applications - NewsMeta’s open-source speech AI models support over 1,100 languages - Applications - News

Expanding Language Coverage in Speech Recognition with Meta’s MMS Project

The advancement of machine learning and speech recognition technology has revolutionized information accessibility, particularly for those who rely on voice. However, the scarcity of labelled data in numerous languages presents a significant challenge in developing high-quality machine-learning models.

Addressing the Language Coverage Issue with MMS Project

To address this challenge, the Meta-led Massively Multilingual Speech (MMS) project has made remarkable strides in expanding language coverage and improving speech recognition and synthesis models’ performance.

Utilizing Self-Supervised Learning Techniques and Religious Texts

The MMS project combined self-supervised learning techniques with a diverse dataset of religious readings to achieve impressive results. By utilizing publicly available audio recordings of people reading religious texts, such as the Bible, in over 1,100 languages, they created a dataset for multilingual speech recognition and synthesis.

Recognizing Over 4,000 Languages with MMS Project

The project expanded language coverage to recognize over 4,000 languages by including unlabeled recordings of other religious readings.

Reducing the Dependence on Labelled Data

Traditional supervised speech recognition models require a large amount of labelled data, which is inadequate for many languages. To overcome this limitation, the MMS project leveraged wav2vec 2.0 self-supervised speech representation learning technique, which significantly reduced the reliance on labelled data.

Impressive Results of MMS Models

Evaluation of the models trained on the MMS data revealed impressive results. Compared to OpenAI’s Whisper, the MMS models exhibited half the word error rate while covering 11 times more languages.

High-Quality Text-to-Speech Systems

Despite having relatively few different speakers for many languages, the text-to-speech systems built using MMS data exhibited high quality.

Mitigating Risks and Collaboration

Although the MMS models have shown promising results, it is essential to acknowledge their imperfections. Misinterpretations or mistranscriptions by the speech-to-text model could result in offensive or inaccurate language. The MMS project emphasizes collaboration across the ai community to mitigate such risks.

Explore the MMS paper or find the project here.

Upcoming Enterprise Technology Events and Webinars by TechForge

By Kevin Don

Hi, I'm Kevin and I'm passionate about AI technology. I'm amazed by what AI can accomplish and excited about the future with all the new ideas emerging. I'll keep you updated daily on all the latest news about AI technology.