Undergrad Students Create AI Speech Model to Compete with NotebookLM

New Open-Source AI Model Generates Podcast-Style Audio
Two undergraduate students, lacking significant prior experience in artificial intelligence, have announced the development of an openly accessible AI model. This model is capable of producing podcast-like audio segments, functioning similarly to Google’s NotebookLM.
Growing Market for Synthetic Speech
The demand for tools that create synthetic speech is substantial and continues to expand rapidly. While ElevenLabs currently holds a prominent position in this market, numerous competitors are emerging, including PlayAI and Sesame. Investment in this sector reflects its perceived potential; PitchBook data indicates that startups focused on voice AI technology secured over $398 million in venture capital funding last year.
The Creation of Nari Labs’ Dia Model
Toby Kim, a co-founder of Nari Labs – the Korea-based team responsible for the new model – explained that he and his partner began exploring speech AI approximately three months ago. Driven by inspiration from NotebookLM, their objective was to build a model that provided greater user control over voice characteristics and offered increased flexibility in scripting.
Utilizing Google’s TPU Research Cloud
Kim detailed that they leveraged Google’s TPU Research Cloud program to train Nari’s model, named Dia. This program grants researchers complimentary access to Google’s TPU AI chips. Dia, comprised of 1.6 billion parameters, is designed to generate dialogue from provided scripts, allowing users to tailor speakers’ vocal tones and incorporate realistic speech patterns like pauses, coughs, and laughter.
Understanding Model Parameters
Model parameters represent the internal variables that AI models utilize to formulate predictions. Generally, a higher number of parameters correlates with improved model performance.
Accessibility and Capabilities of Dia
Dia is readily available through the AI development platforms Hugging Face and GitHub. It can operate on most contemporary personal computers equipped with at least 10GB of VRAM. The model produces a randomized voice unless a specific style is requested, but it also possesses the capability to replicate an individual’s voice.
TechCrunch’s Evaluation of Dia
Initial testing by TechCrunch, conducted via Nari’s web demonstration, revealed that Dia functioned effectively, consistently generating conversational exchanges on a variety of topics. The quality of the generated voices appears to be competitive with existing tools, and the voice cloning feature was found to be particularly user-friendly.
Potential for Misuse and Lack of Safeguards
However, like many voice generation tools, Dia offers limited safeguards against misuse. The creation of disinformation or fraudulent recordings would be relatively straightforward. While Nari discourages the use of the model for impersonation, deception, or other unethical purposes on its project pages, the group disclaims responsibility for any such misuse.
Data Sourcing and Copyright Concerns
Nari has not yet revealed the data sources used to train Dia. There is a possibility that copyrighted material was utilized during development – a commenter on Hacker News pointed out a similarity between one sample and the hosts of NPR’s “Planet Money” podcast. Employing copyrighted content for model training is a common practice, but its legality remains contested. Some AI companies invoke fair use as a defense, while copyright holders dispute its applicability in this context.
Future Plans for Nari Labs
Kim stated that Nari’s future plans involve the creation of a synthetic voice platform incorporating a “social aspect,” built upon Dia and larger, more advanced models. The team also intends to publish a technical report detailing Dia’s development and to broaden the model’s language support beyond English.
Related Posts

OpenAI, Anthropic & Block Join Linux Foundation AI Agent Effort
Alexa+ Updates: Amazon Adds Delivery Tracking & Gift Ideas

Google AI Glasses: Release Date, Features & Everything We Know

EU Antitrust Probe: Google's AI Search Tools Under Investigation

Microsoft to Invest $17.5B in India by 2029 - AI Expansion
