LOGO

Podcastle Launches AI Text-to-Speech with 450+ Voices

March 3, 2025
Podcastle Launches AI Text-to-Speech with 450+ Voices

Podcastle Enters the AI Text-to-Speech Arena with Asyncflow v1.0

The podcast recording and editing platform, Podcastle, has announced its entry into the rapidly evolving AI-powered text-to-speech market. They are releasing their proprietary AI model, named Asyncflow v1.0, to compete with existing solutions.

Alongside the model launch, Podcastle will provide an API for developers. This will enable direct integration of the text-to-speech functionality into various applications.

Extensive Voice Options and Cost Efficiency

The new model empowers Podcastle to offer a library of over 450 distinct AI voices for narrating text. The company highlights that its development focused on minimizing both training and inference costs.

This strategic approach is intended to provide a competitive advantage over other players in the field.

Joining a Growing Market

Podcastle’s move positions it alongside other innovative startups like ElevenLabs, Speechify, and WellSaid. These companies are all dedicated to transforming text into realistic, AI-narrated voice clips.

The applications for this technology are diverse, spanning areas such as marketing, advertising, content creation, education, and corporate training.

Overcoming Development Challenges

Arto Yeritsyan, Podcastle’s founder, revealed that building a text-to-speech model was a long-held ambition for the company.

Initially, the substantial costs associated with training and the extensive data requirements presented significant hurdles. However, recent advancements in large language models facilitated a breakthrough last year.

“We always aimed to create a robust text-to-speech model,” Yeritsyan explained. “But development costs were prohibitive. Recent progress in large language models allowed us to build a high-quality voice model without needing massive datasets.”

Fueling Innovation with Funding

The company’s efforts were also supported by a $13.5 million Series A funding round secured last year.

Competitive Pricing

Yeritsyan noted that Podcastle’s pricing structure is more competitive. They currently charge approximately $40 for 500 minutes of text-to-speech conversion, while ElevenLabs charges $99 for the same amount.

Enhanced Voice Cloning Capabilities

Podcastle is also upgrading its voice cloning feature to streamline the training process.

Previously, training required reading around 70 different sentences. Now, only a few seconds of recorded speech are needed to create a voice clone.

The updated process leverages Podcastle’s “Magic Dust” AI, released last year, to enhance audio recording quality.

podcasting platform podcastle launches a text-to-speech model with more than 450 ai voicesInitial Voice Quality and Future Improvements

Initial testing revealed that the voice generated using the new process sounded somewhat robotic, although it accurately replicated the speaker’s tone.

The company acknowledges this and states that they will continue to refine the feature over time. Users can also train multiple voice samples to achieve varied results.

A Unified Platform Advantage

Podcastle believes that offering integrated tools for audio, video, podcasts, and AI-powered narration within a single, redesigned platform will differentiate it from competitors.

While audio content currently represents the majority of user activity, Yeritsyan reports that video usage is rapidly increasing.

#podcastle#text to speech#ai voices#podcasting#ai podcasting#tts