Podcastle Launches AI Text-to-Speech with 450+ Voices

Podcastle Enters the AI Text-to-Speech Arena with Asyncflow v1.0
The podcast recording and editing platform, Podcastle, has announced its entry into the rapidly evolving AI-powered text-to-speech market. They are releasing their proprietary AI model, named Asyncflow v1.0, to compete with existing solutions.
Alongside the model launch, Podcastle will provide an API for developers. This will enable direct integration of the text-to-speech functionality into various applications.
Extensive Voice Options and Cost Efficiency
The new model empowers Podcastle to offer a library of over 450 distinct AI voices for narrating text. The company highlights that its development focused on minimizing both training and inference costs.
This strategic approach is intended to provide a competitive advantage over other players in the field.
Joining a Growing Market
Podcastle’s move positions it alongside other innovative startups like ElevenLabs, Speechify, and WellSaid. These companies are all dedicated to transforming text into realistic, AI-narrated voice clips.
The applications for this technology are diverse, spanning areas such as marketing, advertising, content creation, education, and corporate training.
Overcoming Development Challenges
Arto Yeritsyan, Podcastle’s founder, revealed that building a text-to-speech model was a long-held ambition for the company.
Initially, the substantial costs associated with training and the extensive data requirements presented significant hurdles. However, recent advancements in large language models facilitated a breakthrough last year.
“We always aimed to create a robust text-to-speech model,” Yeritsyan explained. “But development costs were prohibitive. Recent progress in large language models allowed us to build a high-quality voice model without needing massive datasets.”
Fueling Innovation with Funding
The company’s efforts were also supported by a $13.5 million Series A funding round secured last year.
Competitive Pricing
Yeritsyan noted that Podcastle’s pricing structure is more competitive. They currently charge approximately $40 for 500 minutes of text-to-speech conversion, while ElevenLabs charges $99 for the same amount.
Enhanced Voice Cloning Capabilities
Podcastle is also upgrading its voice cloning feature to streamline the training process.
Previously, training required reading around 70 different sentences. Now, only a few seconds of recorded speech are needed to create a voice clone.
The updated process leverages Podcastle’s “Magic Dust” AI, released last year, to enhance audio recording quality.
Initial Voice Quality and Future Improvements
Initial testing revealed that the voice generated using the new process sounded somewhat robotic, although it accurately replicated the speaker’s tone.
The company acknowledges this and states that they will continue to refine the feature over time. Users can also train multiple voice samples to achieve varied results.
A Unified Platform Advantage
Podcastle believes that offering integrated tools for audio, video, podcasts, and AI-powered narration within a single, redesigned platform will differentiate it from competitors.
While audio content currently represents the majority of user activity, Yeritsyan reports that video usage is rapidly increasing.
Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature
