LOGO

OpenAI Agent Tool Release Date: What We Know

January 20, 2025
OpenAI Agent Tool Release Date: What We Know

OpenAI's Potential Release of 'Operator' AI Tool

Recent indications suggest that OpenAI is nearing the launch of a groundbreaking AI tool designed to autonomously control personal computers and execute tasks on a user’s behalf. This development has generated significant interest within the technology community.

Evidence of the 'Operator' Tool

Tibor Blaho, a software engineer known for accurately predicting upcoming AI releases, has reportedly discovered evidence supporting the existence of OpenAI’s previously discussed Operator tool. Prior reports from sources like Bloomberg have described Operator as an “agentic” system, capable of independently managing complex operations such as code development and travel arrangements.

Information obtained by The Information points to a potential release date in January. Supporting this timeline, code discovered by Blaho over the weekend lends further credibility to these reports.

Features Discovered in the macOS Client

OpenAI’s ChatGPT application for macOS now includes, though currently hidden, options to define shortcuts for “Toggle Operator” and “Force Quit Operator,” as detailed by Blaho. Furthermore, Blaho states that OpenAI has incorporated references to Operator on its official website, although these references are not yet accessible to the public.

Performance Benchmarks and Limitations

According to Blaho’s findings, OpenAI’s website also contains unpublished tables comparing Operator’s performance against other AI systems designed for computer interaction. While these tables may be preliminary, the data suggests that Operator’s reliability varies depending on the specific task.

On the OSWorld benchmark, which simulates a realistic computing environment, “OpenAI Computer Use Agent (CUA)” – potentially the model powering Operator – achieved a score of 38.1%. This surpasses Anthropic’s computer-controlling model but remains significantly lower than the 72.4% attained by human users.

Conversely, OpenAI CUA demonstrates superior performance on WebVoyager, a benchmark assessing an AI’s ability to navigate and interact with websites. However, it falls short of human-level results on WebArena, another web-based evaluation, according to the leaked data.

Challenges with Practical Tasks

If the leaked information is accurate, Operator encounters difficulties with tasks easily completed by humans. For instance, when tasked with creating an account with a cloud provider and launching a virtual machine, Operator succeeded only 60% of the time. Similarly, its success rate in generating a Bitcoin wallet was a mere 10%.

We have contacted OpenAI for a statement and will provide updates as soon as a response is received.

The Expanding AI Agent Landscape

OpenAI’s anticipated entry into the AI agent market coincides with similar initiatives from competitors, including Anthropic and Google. While currently considered a nascent and potentially risky field, AI agents are increasingly being positioned as the next major advancement in artificial intelligence. Market projections from analytics firm Markets and Markets estimate the AI agent market could reach $47.1 billion by 2030.

Safety Considerations and Development

Current AI agents are relatively basic in their capabilities. However, some experts have voiced concerns regarding their potential safety implications as the technology progresses.

Leaked charts indicate that Operator performs favorably on selected safety assessments, including tests designed to prevent the system from engaging in “illicit activities” or seeking “sensitive personal data.” Extensive safety testing is reportedly a key factor contributing to the prolonged development of Operator.

OpenAI co-founder Wojciech Zaremba recently criticized Anthropic on X for releasing an agent lacking adequate safety measures, stating, “I can only imagine the negative reactions if OpenAI made a similar release.”

It is important to note that OpenAI has faced criticism from AI researchers, including former employees, for allegedly prioritizing rapid product deployment over comprehensive safety protocols.

Stay informed with TechCrunch's AI newsletter! Subscribe here to receive it in your inbox every Wednesday.

#OpenAI#agent tool#AI#automation#release date#GPT