Alibaba research program tops AI benchmark rankings

Alibaba’s DAMO Academy, the group’s global research program, marks another major breakthrough in the machine-reading capability of artificial intelligence with its Natural Language Processing (NLP) model topping the GLUE benchmark rankings.

GLUE benchmark ranking is an industry table perceived as the most-important baseline test for the NLP model. DAMO’s NLP model also significantly outperformed the human baselines, a key milestone in the development of robust natural language understanding systems.

The DAMO’s existing model, which is widely deployed in Alibaba’s ecosystem, powers the company’s customer service AI chatbot, retail platform search engines and anonymous healthcare data analysis systems. The model was also used in the text analysis of medical records and epidemiological investigation by centers for disease control in different cities in China to fight against COVID-19.

Have you read “Alibaba Cloud extends integration with the Fortinet Security Fabric”?

“We are excited to achieve a new breakthrough in driving research in NLP development. “As a core technology, not only does NLP underpin Alibaba’s various businesses, which serve hundreds of millions of customers, but it also serves as a critical technology in fighting the coronavirus. We hope we can continue to leverage our leading technologies and contribute to the community during this difficult time.”


Alibaba’s multitask machine-learning model StructBERT delivers empirical results on a variety of downstream tasks, resulting in a GLUE benchmark of 90.3, higher than the human baselines of 87.1. The model, which is based on the pre-trained language model BERT and incorporates word and sentence structures, also boosts performance in many language-understanding applications such as sentiment analysis, textual entailment, and question-answering.

General Language Understanding Evaluation (GLUE) is a platform for evaluating and analyzing NLP systems. It attracts global key AI players, including Google, Facebook, Microsoft and Standard, to participate every year. The GLUE benchmark is an industry table perceived as the most important baseline test for training, evaluating and analyzing NLP systems.


Alibaba has leveraged its proprietary technologies in recent months to help contain the coronavirus. Alibaba DAMO Academy has teamed up with Chinese medical institutions to develop an AI system that can expedite diagnosis and analysis of the virus.

In February, Alibaba Cloud made its cloud-based AI-powered computing platform available for free to global research institutions to accelerate viral gene sequencing, protein screening and other research in treating or preventing the spread of the virus.

This recent BLUE top score is not the first time Alibaba’s machine-learning model has outdone others. On June 20, 2019, Alibaba’s model bested human scores in the Microsoft Machine Reading Comprehension dataset, one of the AI industry’s most-challenging tests for reading comprehension.

model scored 0.54 in the MS Marco question-answering task, outperforming the human score and Microsoft benchmark of 0.539. In 2018, Alibaba also scored higher than the human benchmark in the Stanford Question Answering Dataset, another popular machine reading-comprehension challenge worldwide.