OpenAI Voice Cloning Tool: Still No Release After a Year

OpenAI's Voice Engine: A Year in Preview
In late March of the previous year, OpenAI initially unveiled a limited preview of its Voice Engine, an artificial intelligence service designed for voice cloning. The company asserted that the system could replicate an individual’s voice using only a 15-second audio sample.
Approximately one year has passed, and the tool remains accessible only through this preview program. OpenAI has not yet provided any timeline for a full public launch, nor has it confirmed if a wider release will occur.
Concerns Surrounding Public Release
OpenAI’s hesitation to broadly deploy the service may stem from concerns regarding potential misuse. Alternatively, this cautious approach could be a deliberate strategy to preempt potential regulatory investigations.
Historically, OpenAI has faced criticism suggesting a tendency to prioritize the development of innovative features over comprehensive safety considerations. Accusations have also been made regarding the company’s inclination to expedite product releases in order to gain a competitive advantage.
Current Testing and Applications
According to a statement provided to TechCrunch, OpenAI is currently conducting ongoing testing of Voice Engine with a select group of “trusted partners.”
The spokesperson explained that the company is actively gathering insights from its partners’ usage patterns. This feedback is being utilized to refine the model’s functionality and enhance its safety protocols.
“We’ve been excited to see the different ways it’s being used,” the spokesperson stated. “Applications range from speech therapy and language acquisition to customer service solutions, video game character voices, and the creation of AI avatars.”
Diverse Use Cases Explored
- Speech Therapy: Assisting individuals with speech impairments.
- Language Learning: Providing realistic pronunciation practice.
- Customer Support: Creating more engaging and personalized interactions.
- Video Games: Developing immersive character voices.
- AI Avatars: Enabling realistic and expressive digital representations.
The ongoing evaluation process aims to ensure responsible development and deployment of this powerful AI technology.
Delayed Launch of Voice Engine
The Voice Engine, responsible for the realistic voices within OpenAI’s text-to-speech API and ChatGPT’s Voice Mode, produces remarkably natural speech. This technology effectively transforms written text into audible speech, though its functionality is governed by specific content restrictions. However, its initial rollout experienced postponements and fluctuating timelines.
According to a June 2024 blog post from OpenAI, the Voice Engine model functions by predicting the most likely sounds a speaker would produce when reading a given text. It considers various vocal characteristics, including different voices, accents, and speaking patterns. Consequently, the model is capable of generating not only spoken text but also vocalizations that mimic how diverse speakers would articulate the same content.
Initial Plans and Trademarking
OpenAI originally scheduled the release of Voice Engine, previously known as Custom Voices, for March 7, 2024, as indicated in a draft blog post reviewed by TechCrunch. The initial strategy involved granting access to a select group of approximately 100 “trusted developers” prior to a broader launch. Priority was to be given to developers creating applications with a demonstrable “social benefit” or exhibiting “innovative and responsible” technology utilization.
Furthermore, OpenAI had already trademarked the service and established pricing: $15 per million characters for “standard” voices and $30 per million characters for “HD quality” voices.
Postponement and Limited Access
However, the announcement was delayed at the last moment. OpenAI eventually introduced Voice Engine several weeks later, but without a registration option. Access remained restricted to a small group of roughly 10 developers who had been collaborating with the company since late 2023, as OpenAI communicated.
OpenAI expressed its intention to initiate discussions regarding the ethical implementation of synthetic voices and societal adaptation to these advancements. The company stated, in its late March 2024 announcement, that decisions regarding wider deployment would be based on these conversations and the outcomes of limited testing.
Future Considerations
“A more informed decision about whether and how to deploy this technology at scale” will be made following these discussions and tests, OpenAI explained. This cautious approach highlights the company’s commitment to responsible AI development and deployment.
A Prolonged Development Process
According to OpenAI, the development of Voice Engine commenced in 2022. The company asserts that a demonstration of the tool was presented to “global policymakers at the highest levels” during the summer of 2023, highlighting both its capabilities and potential risks.
Currently, access to Voice Engine is granted to a number of partners, including the startup Livox. Livox is focused on creating devices designed to facilitate more natural communication for individuals with disabilities. Carlos Pereira, Livox’s CEO, shared with TechCrunch that while the tool’s requirement for an internet connection prevented its integration into a Livox product – as many of their clients lack consistent internet access – the technology itself was deemed “really impressive.”
Pereira emphasized, via email, the uniqueness of the voice quality and the ability to render voices in multiple languages, particularly for the benefit of their customer base. He stated that it represents the most user-friendly and effective voice creation tool he has encountered. Livox anticipates the development of an offline version by OpenAI in the near future.
Pereira reports that he has not received any communication from OpenAI regarding a potential public launch of Voice Engine, nor has he observed any indications of impending monetization for the service. To date, Livox has not incurred any costs for its utilization.
Safety Considerations and Mitigations
In a June 2024 post, OpenAI indicated that concerns about potential misuse during the 2023 U.S. election cycle contributed to the delay in releasing Voice Engine. Following discussions with stakeholders, several safety measures have been incorporated, including watermarking to establish the origin of generated audio.
OpenAI stipulates that developers must secure “explicit consent” from the original speaker prior to employing Voice Engine. Furthermore, they are required to provide “clear disclosures” to their audience indicating that the voices are AI-generated. However, the company has not detailed how these policies will be enforced.
OpenAI has also suggested plans to develop a “voice authentication experience” for speaker verification and a “no-go” list to prevent the replication of voices resembling well-known individuals. These are ambitious technical undertakings, and failure to execute them effectively could damage the reputation of a company already facing scrutiny regarding safety protocols.
The Rise of Voice Cloning Scams
Robust filtering and identity verification are increasingly essential for the responsible release of voice cloning technologies. One source identifies AI voice cloning as the third fastest-growing scam in 2024. This has resulted in fraudulent activities and circumvention of bank security measures, as legal frameworks struggle to adapt. Malicious actors have leveraged voice cloning to fabricate damaging deepfakes of public figures, which have rapidly disseminated across social media platforms.
OpenAI’s decision regarding Voice Engine’s release remains uncertain – it could occur as soon as next week, or potentially never. The company has repeatedly stated its consideration of maintaining a limited scope for the service. Regardless, the extended preview period of Voice Engine has become notably long, driven by both safety concerns and public perception.
Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature
