AI Agent Evaluation - Coval | Quality Assurance for AI

The Convergence of AI Evaluation: From Self-Driving Cars to Voice Agents

A compelling parallel exists between the evaluation methods for AI voice agents and self-driving cars, as posited by Brooke Hopkins, formerly a tech lead at Waymo. Her new venture, Coval, is designed to capitalize on this connection.

Recognizing Shared Challenges in AI Development

Hopkins observed that the difficulties encountered during the development of autonomous vehicles at Waymo mirrored those prevalent throughout the broader AI industry. She noted a tendency to view these challenges as entirely novel, leading to redundant efforts in establishing testing protocols. This realization prompted her to leverage the decade of experience gained in self-driving technology.

“Many believed new testing practices needed to be created from the ground up,” Hopkins explained to TechCrunch. “However, we had already spent ten years perfecting these methods for self-driving systems.”

Coval: A Simulation Platform for AI Agent Assessment

Launched in 2024, Coval provides a platform for building simulations specifically tailored to assess the performance of AI voice and chat agents. The system replicates the rigorous testing procedures Hopkins employed at Waymo.

Coval’s capabilities include running thousands of simulations concurrently, presenting agents with tasks such as making restaurant reservations or responding to nuanced customer service inquiries.

Customizable Metrics and Data-Driven Insights

The platform evaluates agents against a standardized suite of metrics. However, it also allows companies to define their own specific criteria and monitor for performance regressions. Furthermore, the generated data and resulting insights can be shared with end-users, either as demonstrations or as ongoing monitoring tools to confirm agent functionality.

coval evaluates ai voice and chat agents like self-driving cars

Addressing Enterprise Concerns About AI Agent Reliability

A significant obstacle to enterprise adoption of AI agents is a lack of confidence in their reliability. Hopkins emphasized that businesses need assurance beyond superficial demonstrations.

“Executives face a complex decision-making process when selecting vendors,” she stated. “It’s difficult to ascertain what questions to ask and how to validate that agents are performing as expected. Coval provides the means to demonstrate this performance definitively.”

Rapid Growth and Seed Funding

The concept behind Coval was refined during the Y Combinator Summer 2024 program, with a public launch following in October 2024. Demand for the platform has surged in recent months, with clients eager to evaluate their AI agents.

The San Francisco-based startup recently secured a $3.3 million seed round, led by MaC Venture Capital, with participation from Y Combinator and General Catalyst. These funds will be allocated to expanding the engineering team and achieving product-market fit. Future development plans include extending Coval’s capabilities to evaluate other types of AI agents, such as those operating on the web.

Navigating a Crowded AI Agent Landscape

Coval’s emergence coincides with a period of heightened interest – and considerable hype – surrounding AI agents. Industry leaders, like Marc Benioff of Salesforce, are predicting widespread deployment of these technologies.

The market is also witnessing a proliferation of startups in this space. Over 100 AI agent startups participated in Y Combinator’s 2024 cohorts alone, with some securing substantial venture funding. For example, /dev/agents raised $55 million in a seed round at a $500 million valuation in November 2024.

A Competitive Advantage Built on Experience

This competitive landscape suggests a growing need for robust agent evaluation tools. Hopkins believes Coval is well-positioned to succeed due to its foundational expertise.

“Our advantage lies in my five years of experience building these systems repeatedly,” she explained. “We’ve learned from multiple iterations, understanding both their successes and failures. These learnings are directly integrated into Coval’s design.”

Stay informed about the latest in AI with TechCrunch’s AI-focused newsletter! Subscribe here to receive it weekly.

Topics

More

AI Agent Evaluation - Coval | Quality Assurance for AI

The Convergence of AI Evaluation: From Self-Driving Cars to Voice Agents

Recognizing Shared Challenges in AI Development

Coval: A Simulation Platform for AI Agent Assessment

Customizable Metrics and Data-Driven Insights

Rapid Growth and Seed Funding

Navigating a Crowded AI Agent Landscape

A Competitive Advantage Built on Experience

Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature

Amazon Appoints Peter DeSantis to Lead New AI Organization