Sesame Releases Maya's Base AI Model

Sesame Releases Base Model for Realistic Voice Assistant Maya

The AI firm Sesame has made available the foundational model that drives Maya, its remarkably lifelike voice assistant.

CSM-1B: A Commercially Usable Model

This model, designated CSM-1B, comprises 1 billion parameters – individual elements within the model’s structure. It is released under the permissive Apache 2.0 license, allowing for broad commercial application with minimal restrictions.

According to Sesame’s documentation on the AI development platform Hugging Face, CSM-1B generates “RVQ audio codes” from both textual and audio inputs.

Understanding RVQ Technology

RVQ stands for “residual vector quantization,” a method for converting audio into distinct units known as codes. This technique is increasingly prevalent in modern AI audio technologies.

Examples of its use include Google’s SoundStream and Meta’s Encodec.

Model Architecture

CSM-1B utilizes a model from Meta’s Llama family as its core, combined with an audio “decoder” component. Maya, Sesame’s voice assistant, is powered by a refined version of CSM-1B.

Sesame clarifies that the open-sourced model is a base generation model. While capable of producing diverse voices, it hasn’t been specifically fine-tuned for any particular voice.

Language Capabilities and Limitations

The model exhibits some limited capacity for languages other than English. This is attributed to incidental data present during the training process, but performance in non-English languages is expected to be suboptimal.

The specific dataset used to train CSM-1B remains undisclosed by the company.

Safeguards and Ethical Considerations

Currently, the model lacks robust safeguards. Sesame relies on an honor system, requesting developers and users to refrain from voice mimicry without consent, the creation of deceptive content, or engagement in harmful activities.

Testing the Hugging Face demo revealed that voice cloning could be achieved in under a minute. This facilitated the generation of speech on various subjects, even those considered sensitive.

Concerns Regarding Voice Cloning

Consumer Reports has recently cautioned that many readily available AI-powered voice cloning tools lack adequate safeguards against fraud and misuse.

Sesame’s Breakthrough Technology

Co-founded by Brendan Iribe, a co-creator of Oculus, Sesame gained significant attention in late February for its advanced assistant technology. Maya and Miles, Sesame’s other assistant, exhibit realistic characteristics like natural breathing patterns and speech disfluencies.

Furthermore, these assistants can be interrupted mid-sentence, mirroring the behavior of OpenAI’s Voice Mode.

Funding and Future Developments

Sesame has secured an undisclosed amount of funding from Andreessen Horowitz, Spark Capital, and Matrix Partners. The company is also developing AI-powered glasses designed for continuous wear, incorporating its proprietary models.

These glasses are intended to be equipped with the company’s custom AI models.

Topics

More

Sesame Releases Maya's Base AI Model | Open Source AI

Sesame Releases Base Model for Realistic Voice Assistant Maya

CSM-1B: A Commercially Usable Model

Understanding RVQ Technology

Model Architecture

Language Capabilities and Limitations

Safeguards and Ethical Considerations

Concerns Regarding Voice Cloning

Sesame’s Breakthrough Technology

Funding and Future Developments

Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature

Amazon Appoints Peter DeSantis to Lead New AI Organization