LOGO

Alibaba Qwen AI Models: Control PCs & Phones

January 27, 2025
Alibaba Qwen AI Models: Control PCs & Phones

Alibaba Unveils Qwen2.5-VL AI Models

While DeepSeek, a Chinese AI laboratory, has recently garnered significant attention within the technology sector, Alibaba, a leading domestic competitor, has simultaneously been making advancements in artificial intelligence.

New Capabilities of Qwen2.5-VL

Alibaba’s Qwen team introduced a new series of AI models, designated Qwen2.5-VL, on Monday. These models demonstrate proficiency in a variety of text and image analysis applications. They are capable of processing files, interpreting video content, and identifying objects within images.

Furthermore, Qwen2.5-VL exhibits the ability to control a personal computer, mirroring the functionality of the model that powers OpenAI’s newly released Operator.

Performance Benchmarks

According to benchmarks conducted by the Qwen team, the most powerful Qwen2.5-VL model surpasses OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 2.0 Flash in several key areas. These include video comprehension, mathematical reasoning, document analysis, and question-answering accuracy.

alibaba’s qwen team releases ai models that can control pcs and phonesFeatures and Applications

Qwen2.5-VL is currently accessible for testing within Alibaba’s Qwen Chat application and can be downloaded from the AI development platform, Hugging Face. The models can analyze charts and graphical data, extract information from scanned invoices and forms, and process lengthy video recordings—spanning several hours—according to the Qwen team.

The team also notes that Qwen2.5-VL can identify intellectual property from films and television series, as well as a diverse range of products, potentially indicating training data included copyrighted material.

Content Restrictions

As an AI developed by a Chinese company, Qwen2.5-VL is subject to certain content limitations, particularly within the Qwen Chat interface. An attempt to solicit commentary on “Xi Jinping’s mistakes” from the largest model, Qwen2.5-VL-72B, resulted in an error message.

China’s internet regulator implements benchmarks for domestically developed models to ensure alignment with “core socialist values.” Consequently, many Chinese AI systems avoid responding to sensitive topics, such as the autonomy of Taiwan.

Software Interaction

A notable feature of Qwen2.5-VL is its capacity to interact with software applications on both PCs and mobile devices. A demonstration posted on X by Philipp Schmid, a technical lead at Hugging Face, showcased Qwen2.5-VL launching the Booking.com app on Android and completing a flight booking from Chongqing to Beijing.

Another video depicts a Qwen2.5-VL model controlling applications on a Linux desktop, although its actions were limited to switching between tabs. Interestingly, Qwen’s benchmarks reveal a lower score for Qwen2.5-VL on OSWorld, a benchmark designed to simulate a realistic computer environment.

Licensing Information

The two smaller models in the Qwen2.5-VL series, Qwen2.5-VL-3B and Qwen2.5-VL-7B, are released under a permissive license. However, the flagship model, Qwen2.5-VL-72B, is governed by Alibaba’s custom license.

This license stipulates that organizations and developers exceeding 100 million monthly active users must obtain permission from Qwen/Alibaba prior to commercial deployment of the model.

  • Qwen2.5-VL offers advanced text and image analysis.
  • The models can control PCs and mobile devices.
  • Performance benchmarks indicate superiority over competing models.
  • Content restrictions are in place due to Chinese regulations.
#alibaba#qwen#ai models#artificial intelligence#pc control#phone control