Undergrad Students Create AI Speech Model to Compete with NotebookLM

New Open-Source AI Model Generates Podcast-Style Audio

Two undergraduate students, lacking significant prior experience in artificial intelligence, have announced the development of an openly accessible AI model. This model is capable of producing podcast-like audio segments, functioning similarly to Google’s NotebookLM.

Growing Market for Synthetic Speech

The demand for tools that create synthetic speech is substantial and continues to expand rapidly. While ElevenLabs currently holds a prominent position in this market, numerous competitors are emerging, including PlayAI and Sesame. Investment in this sector reflects its perceived potential; PitchBook data indicates that startups focused on voice AI technology secured over $398 million in venture capital funding last year.

The Creation of Nari Labs’ Dia Model

Toby Kim, a co-founder of Nari Labs – the Korea-based team responsible for the new model – explained that he and his partner began exploring speech AI approximately three months ago. Driven by inspiration from NotebookLM, their objective was to build a model that provided greater user control over voice characteristics and offered increased flexibility in scripting.

Utilizing Google’s TPU Research Cloud

Kim detailed that they leveraged Google’s TPU Research Cloud program to train Nari’s model, named Dia. This program grants researchers complimentary access to Google’s TPU AI chips. Dia, comprised of 1.6 billion parameters, is designed to generate dialogue from provided scripts, allowing users to tailor speakers’ vocal tones and incorporate realistic speech patterns like pauses, coughs, and laughter.

Understanding Model Parameters

Model parameters represent the internal variables that AI models utilize to formulate predictions. Generally, a higher number of parameters correlates with improved model performance.

Accessibility and Capabilities of Dia

Dia is readily available through the AI development platforms Hugging Face and GitHub. It can operate on most contemporary personal computers equipped with at least 10GB of VRAM. The model produces a randomized voice unless a specific style is requested, but it also possesses the capability to replicate an individual’s voice.

TechCrunch’s Evaluation of Dia

Initial testing by TechCrunch, conducted via Nari’s web demonstration, revealed that Dia functioned effectively, consistently generating conversational exchanges on a variety of topics. The quality of the generated voices appears to be competitive with existing tools, and the voice cloning feature was found to be particularly user-friendly.

Potential for Misuse and Lack of Safeguards

However, like many voice generation tools, Dia offers limited safeguards against misuse. The creation of disinformation or fraudulent recordings would be relatively straightforward. While Nari discourages the use of the model for impersonation, deception, or other unethical purposes on its project pages, the group disclaims responsibility for any such misuse.

Data Sourcing and Copyright Concerns

Nari has not yet revealed the data sources used to train Dia. There is a possibility that copyrighted material was utilized during development – a commenter on Hacker News pointed out a similarity between one sample and the hosts of NPR’s “Planet Money” podcast. Employing copyrighted content for model training is a common practice, but its legality remains contested. Some AI companies invoke fair use as a defense, while copyright holders dispute its applicability in this context.

Future Plans for Nari Labs

Kim stated that Nari’s future plans involve the creation of a synthetic voice platform incorporating a “social aspect,” built upon Dia and larger, more advanced models. The team also intends to publish a technical report detailing Dia’s development and to broaden the model’s language support beyond English.

Topics

More

Undergrad Students Create AI Speech Model to Compete with NotebookLM

New Open-Source AI Model Generates Podcast-Style Audio

Growing Market for Synthetic Speech

The Creation of Nari Labs’ Dia Model

Utilizing Google’s TPU Research Cloud

Understanding Model Parameters

Accessibility and Capabilities of Dia

TechCrunch’s Evaluation of Dia

Potential for Misuse and Lack of Safeguards

Data Sourcing and Copyright Concerns

Future Plans for Nari Labs

Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature

Amazon Appoints Peter DeSantis to Lead New AI Organization