Alibaba says it has an AI model even better than DeepSeek

In This Article:

Alibaba Group at Viva Technology on June 14, 2023 in Paris, France. - Photo: Chesnot (Getty Images)
Alibaba Group at Viva Technology on June 14, 2023 in Paris, France. - Photo: Chesnot (Getty Images)

Days after Chinese artificial intelligence startup DeepSeek sparked a global tech stock sell-off, a homegrown rival said its new AI model performed even better.

Alibaba Cloud (BABA) released an upgraded version of its flagship AI model, Qwen2.5-Max, that performed better than top open-source competitors, including DeepSeek’s V3 model and Meta’s (META) Llama 3.1 model on various benchmarks, according to results published by the firm on WeChat. The cloud computing subsidiary of Alibaba Group also found its Qwen2.5-Max showed comparable performance to OpenAI’s GPT-4 and Anthropic’s Claude 3.5 Sonnet — both closed-source models.

The Chinese firm said its AI model “has demonstrated world-leading model performance in mainstream authoritative benchmarks,” including the Massive Multitask Language Understanding (MMLU), which evaluates general knowledge, and LiveCodeBench, which tests coding skills.

The Qwen2.5-Max announcement follows DeepSeek’s launch of its first-generation reasoning models, DeepSeek-R1, last week, which demonstrated comparable performance to OpenAI’s reasoning models, O1-mini and O1, on several industry benchmarks, according to its technical paper.

The release of DeepSeek-R1 prompted Nasdaq, Dow Jones Industrial Average, and S&P 500 futures to fall Monday morning. Nvidia’s (NVDA) shares plunged 17%, wiping out nearly $600 billion in value — a record loss for a U.S. company.

Investors were spooked by the DeepSeek-R1 launch, which comes after the December release of DeepSeek-V3. While Alibaba Cloud hasn’t disclosed its development costs, DeepSeek’s claim that it built its model for just $5.6 million using Nvidia’s reduced-capability graphics processing units has caught the market’s attention, challenging assumptions about the massive investments needed for AI development.

According to the technical paper, DeepSeek used a cluster of just under 2,050 Nvidia H800 chips for training its V3 model — a less powerful version of the chipmaker’s H100 that it is allowed to sell to Chinese firms under U.S. chip restrictions. The cluster is also much smaller than the tens of thousands of chips U.S. firms are using to train similarly-sized models.

DeepSeek’s release has called Big Tech’s tens of billions in spending on AI into question ahead of a slate of earnings results, as well as the effectiveness of U.S. efforts to curb advanced chips from entering the country.

For the latest news, Facebook, Twitter and Instagram.