OpenAI Agent Helped Me Move - With a Little Help From Me

OpenAI's Operator: A Week-Long Test of an AI Agent
OpenAI provided a one-week access period to evaluate its novel AI agent, Operator. This system is designed to perform tasks autonomously across the internet on behalf of the user.
Operator represents the most advanced iteration of AI agents observed to date. These systems aim to automate routine activities, thereby liberating individuals to focus on more fulfilling endeavors. However, based on initial testing, fully “autonomous” AI remains a developing technology.
The Technology Behind Operator
A newly trained model underpins Operator’s functionality. This model integrates the visual processing abilities of GPT-4o with the logical reasoning skills of o1.
The resulting system demonstrates proficiency in fundamental tasks. Observations included Operator successfully clicking interface elements, navigating website menus, and completing online forms.
Limitations Observed During Testing
Despite its capabilities, the trial period revealed a need for significant user intervention. The experience often felt akin to guiding Operator through each challenge, rather than completely offloading tasks.
Frequently, the agent required answers to clarifying questions, permission grants, personal data input, and assistance when encountering obstacles.
A useful analogy is to compare Operator to a vehicle equipped with cruise control. It offers intermittent assistance, reducing driver workload, but falls short of providing full self-driving capabilities.
Intentional Pauses and Safety Considerations
OpenAI explicitly states that Operator’s frequent pauses are an intentional design feature.
Similar to the AI driving chatbots like ChatGPT, Operator’s underlying model is susceptible to inaccuracies and cannot reliably operate independently for extended durations. This susceptibility to “hallucinations” prompts OpenAI to limit the system’s autonomous decision-making and access to sensitive user data.
While this cautious approach may prioritize safety, it concurrently diminishes Operator’s overall utility.
Conclusion: A Promising First Step
Despite its current limitations, OpenAI’s initial AI agent serves as a compelling demonstration of concept. It showcases an interface capable of interacting with the front-end of any website.
The development of truly independent AI systems necessitates the creation of more dependable AI models. These models must require less human oversight to achieve their intended functions.
Further advancements are needed to realize the full potential of autonomous AI agents.
An Overly Intrusive Assistance Experience
The timing of my trial with Operator coincided with a relocation to a new apartment, presenting an opportunity to utilize OpenAI’s agent for logistical support during the move.
I requested Operator’s assistance in procuring a new parking permit. The agent responded affirmatively and subsequently initiated a browser window on my computer’s display.
A search was then conducted by Operator for a San Francisco parking permit, directing me to the official city website and even the specific relevant page.
Unlike Google’s Project Mariner, Operator allows for continued use of the computer while operating, as its processes are executed remotely in the cloud rather than directly on the machine.
However, obtaining the parking permit required granting Operator permissions to initiate several processes. Frequent interruptions occurred, prompting requests for personal details like my name, phone number, and email address. On occasion, the agent became disoriented, necessitating manual browser control to redirect its actions.In a separate evaluation, I tasked Operator with securing a reservation at a Greek restaurant. The agent successfully identified a suitable establishment nearby with acceptable pricing. Nevertheless, I was required to provide responses to over six inquiries throughout the reservation process.
Considering the need for six or more interventions simply to complete a restaurant booking via an AI agent, a critical question arose: at what point does direct self-service become the more efficient option? This was a recurring thought during my evaluation of Operator.The Emerging Agent-as-a-Platform Model
During testing phases, certain websites were found to restrict access for the Operator agent. For instance, an attempt to secure an electrician through TaskRabbit resulted in an error message from the agent, prompting a request to utilize a different service.
Similarly, platforms like Expedia, Reddit, and YouTube also implemented blocks, preventing the AI agent from accessing their respective sites.
Conversely, several services have proactively welcomed Operator's integration. Instacart, Uber, and eBay actively partnered with OpenAI prior to Operator’s release, granting the agent permission to interact with their websites on user’s behalf.
These companies are anticipating a shift towards AI agents handling a portion of user interactions.
Daniel Danker, Instacart’s chief product officer, explained to TechCrunch that Instacart already supports customer access through multiple channels. He views Operator as a potential addition to these existing entry points.
Allowing OpenAI’s agent to utilize Instacart’s website could potentially distance the company from its direct customer base. However, Danker emphasized Instacart’s commitment to reaching customers on their preferred platforms.
Nitzan Mekel-Bobrov, eBay’s chief AI officer, shared with TechCrunch their strong conviction, aligning with OpenAI, that agentic systems will significantly alter consumer engagement with digital platforms.
Despite the potential growth of AI agents, Mekel-Bobrov anticipates continued direct user visits to eBay’s website, asserting that online destinations will remain relevant.
Here's a breakdown of the collaborative approach:
- Instacart: Views Operator as another access point for customers.
- Uber: Actively partnered to enable agent interaction.
- eBay: Believes agents will complement, not replace, direct website visits.
The differing responses highlight a key debate: whether AI agents represent a disruption or an evolution of existing digital interactions.
Key takeaway: The future may involve a hybrid model where AI agents and direct user engagement coexist.
Concerns Regarding Reliability
Initial experiences with Operator led to some concerns about its trustworthiness, stemming from instances of inaccurate information that could have resulted in significant financial loss.
Specifically, a request to locate nearby parking facilities highlighted this issue. The agent provided recommendations for two garages, claiming a short walking distance to my new residence.
However, these garages were considerably further away than indicated, and also exceeded my budgetary constraints. One location required a 20-minute walk, while the other necessitated a 30-minute walk; the error originated from an incorrect address input by Operator.This situation underscores the reason OpenAI restricts agent access to sensitive information like credit card details, passwords, and email accounts. Without human oversight, Operator could have incurred substantial expenses for an unnecessary parking space.
Such hallucinations represent a major obstacle to the development of truly helpful, autonomous agents – those capable of handling routine tasks independently. User confidence will remain low if agents consistently make fundamental errors, particularly those with tangible repercussions.
OpenAI has demonstrably created robust tools enabling AI systems to navigate the internet with Operator. Nevertheless, the effectiveness of these tools is contingent upon the underlying AI’s ability to consistently fulfill user requests accurately.
Currently, human intervention remains essential to assist agents, rather than being assisted by them. This outcome somewhat diminishes the intended benefits of autonomous agents.
TechCrunch offers a newsletter dedicated to Artificial Intelligence! Subscribe here to receive it weekly in your inbox on Wednesdays.





