LOGO

wellsaid Raises $10M to Revolutionize Synthetic Speech

July 7, 2021
wellsaid Raises $10M to Revolutionize Synthetic Speech

WellSaid Labs Secures $10M Series A Funding

WellSaid Labs, a company specializing in the creation of highly realistic synthetic speech, has successfully raised $10 million in a Series A funding round. This investment will be utilized to facilitate the expansion of the company’s operations and further develop its innovative technologies.

Advancements in Text-to-Speech Technology

The company’s proprietary text-to-speech engine distinguishes itself through its speed and ability to generate natural-sounding audio. It operates at a pace exceeding real time, capable of producing audio clips ranging from brief segments to extensive, multi-hour recordings.

Origins and Evolution

Emerging from the Allen Institute for AI incubator in 2019, WellSaid Labs initially focused on addressing the robotic quality often associated with synthetic voices. The aim was to provide more natural-sounding options for common business applications, such as training materials and marketing content.

From Tacotron to a Proprietary Engine

Initially, the solution was built upon Tacotron, a speech engine originally developed by Google and academic researchers. However, WellSaid Labs subsequently engineered its own engine, achieving greater efficiency and producing more convincing vocalizations. This new engine also overcomes the limitations of many speech engines, which often falter after a few sentences.

Demonstrated Capabilities

WellSaid’s engine successfully narrated the complete text of Mary Shelley’s “Frankenstein” without any noticeable degradation in quality or tone. This demonstrates a significant advancement in the field of synthetic voice technology.

Superior Performance and Speed

Independent listeners have rated the generated voices as either human-like or comparable to human speech quality. Furthermore, the speech is generated significantly faster than real time, contrasting with other high-quality options that may require considerably longer processing times – for example, generating three minutes of speech in as little as one minute with WellSaid, versus potentially half an hour with Tacotron.

https://techcrunch.com/wp-content/uploads/2020/09/wellsaid-clip.mp3

Voice Avatar Creation

The system enables the creation of unique “Voice Avatars” modeled after existing voice talent, such as company spokespersons or voiceover artists. The initial requirement of 20 hours of audio to build a voice model has been reduced to just two hours, as stated by CEO Matt Hocking.

Business Focus and Future Considerations

Currently, WellSaid Labs is exclusively focused on business applications. There are no plans to offer a consumer-facing application for digitizing individual voices, due to associated risks and a lack of a viable business model.

Potential for Accessibility

The potential benefits of this technology for individuals with disabilities are acknowledged by Hocking, though he notes that addressing this application is not currently a priority for the company.

Image Credits: WellSaid Labs

“We are dedicated to broadening access to this technology, ensuring that individuals with communication challenges, non-profit organizations, and others can benefit from its capabilities,” Hocking stated.

Expanding Market Applications

The company has expanded its reach beyond its initial focus on corporate training videos, now serving markets such as marketing, extended-form content, interactive products with substantial text, and application experiences.

Ethical Considerations

It is hoped that the voice talent whose likenesses are used to create these avatars are receiving appropriate compensation for their contributions.

Investment Details and Future Plans

The $10 million funding round was led by FUSE, with participation from Voyager, Qualcomm Ventures LLC, and GoodFriends. The investment will be directed towards enhancing the product offering and expanding the company’s team. The potential for growth within the synthetic voice market remains substantial, as content creation represents a relatively untapped application.

#synthetic speech#AI voice#voice cloning#funding#wellsad#speech technology