The News: On August 22, Meta announced the public release of SeamlessM4T, an all-in-one multimodal and multilingual AI translation model that allows people to communicate through speech and text across different languages. The model supports nearly 100 languages in speech recognition and speech-to-text translation, as well as support for speech-to-speech translation (100 input languages and 36 output languages).
Here are some of the other pertinent details:
- SeamlessM4T is open, publicly released under research license.
- Metadata of SeamlessAlign, the related multimodal translation dataset, is open and being released as well. (NOTE: Meta is only making the metadata available, not the source data set.)
- Meta acknowledges the limitations of what SeamlessM4T (and other similar projects) can do: “Compared to approaches using separate models, SeamlessM4T’s single-system approach reduces errors and delays, increasing the efficiency and quality of the translation process.”
Read the full Press Release on the introduction of Meta’s SeamlessM4T here.
In a separate blog post, Meta provides more details about some of the challenges around universal translation, especially around the issues of toxicity and bias. Here, Meta references the demo that was released on August 21:
“We detect toxicity in both the input and the output for the demo. If toxicity is only detected in the output, it means that toxicity is added. In this case, we include a warning and do not show the output. When comparing our models to the state of the art, we significantly reduce added toxicity on both speech-to-speech and speech-to-text translation.
Gender bias, where the results unfairly favor a gender and sometimes default to gender stereotypes, is another area we are beginning to evaluate in languages at scale. We are now able to quantify gender bias in dozens of speech translation directions by extending our previously-designed Multilingual HolisticBias dataset to speech.”
Meta AI researchers have been on the quest to build a universal translator for several years. For context, Philipp Koehn, a research scientist at Meta, shared some of the challenges and vision around Meta’s universal translator efforts in a blog post in December 2021. A few points of interest that are relevant to SeamlessM4T:
- Multilingual model is a breakthrough: “Traditional models require large quantities of translated text. Models are then developed for each direction that people want to translate, producing a model from one language to another, which is known as a bilingual model. This does not work well when supporting many languages, since building and maintaining thousands of models for each possible language pair would create excessive computational complexity. That’s why researchers are looking at a new approach called “multilingual models” as the way forward. These are models that build some representation of text that’s common to all languages.”
- Multilingual models require massive amounts of compute power in AI training. It is unclear how much compute is needed for AI inference.
- When asked how multilingual advancements might help the AI field overall, Koehn said it is a push toward general intelligence: “AI systems that are capable of addressing many different problems and cross-applying knowledge between them. In the same spirit, multilingual translation models solve the general translation problem, not the specific problem of a particular language pair. Multilingual is a step in that direction. It leads to more flexible systems that can serve more tasks. It is more efficient because it frees up capacity — which allows us to roll out new features instantly to people around the world. Finally, it’s closer to human thinking. As humans, we don’t have specialized models for each task; we have one brain that does many different things. Multilingual models, just like pretrained models, are bringing us closer to that.”
Meta Introduces SeamlessM4T Model in a Step Toward a Universal Translator
Analyst Take: One of AI’s moonshots has been universal translation, which could unlock a plethora of use cases and revenue opportunities. The primary competitors in this race to the moon have been Google, Amazon, and Meta. Is Meta’s SeamlessM4T a significant breakthrough? What are the juicy use cases? Here’s a look at these questions.
Not Ready Yet
SeamlessM4T is different in that it seeks to combine translation and transcription into a single model. It is also different from Google in that Meta’s model is open source. Both of these factors will help advance SeamlessM4T in ways Google and Amazon will not at this point, particularly given Meta’s willingness to make SeamlessM4T open source.
But that said, machine translation is not something that will be fully automated for some time. Meta acknowledged the issues with toxicity and bias the model has, something that is a common denominator in the use of large language models (LLMs) trained on public data. Research published in April, prior to SeamlessM4T’s debut, found this:
“Large language models have demonstrated remarkable potential in handling multilingual machine translation (MMT)… We evaluate popular LLMs, including XGLM, OPT, BLOOMZ, and ChatGPT, on 102 languages. Our empirical results show that even the best model ChatGPT still lags behind the supervised baseline NLLB in 83.33% of translation directions. Through further analysis, we discover that LLMs exhibit new working patterns when used for MMT. First, prompt semantics can surprisingly be ignored when given in-context exemplars, where LLMs still show strong performance even with unreasonable prompts. Second, cross-lingual exemplars can provide better task instruction for low-resource translation than exemplars in the same language pairs. Third, we observe the overestimated performance of BLOOMZ on dataset Flores-101, indicating the potential risk when using public datasets for evaluation.”
Use Cases
In theory, markets widen when language has no boundaries. Commerce, particularly e-commerce and digital goods, would explode with dependable automatic universal translation, since sellers and buyers could match up regardless of language preference. Digital marketing and advertising and lots of paid media would be similarly impacted. There are plenty more, but it is easy to see why Meta, Google, and Amazon, as premier digital advertisers, e-commerce players, and media companies are pushing the innovation.
Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.
Other insights from The Futurum Group:
Qualcomm-Meta Llama 2 Could Unleash LLM Apps at the Edge
Google Search Generative Experience: Will Gen AI Impact Search?
Author Information
Mark comes to The Futurum Group from Omdia’s Artificial Intelligence practice, where his focus was on natural language and AI use cases.
Previously, Mark worked as a consultant and analyst providing custom and syndicated qualitative market analysis with an emphasis on mobile technology and identifying trends and opportunities for companies like Syniverse and ABI Research. He has been cited by international media outlets including CNBC, The Wall Street Journal, Bloomberg Businessweek, and CNET. Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.