Google Cloud AI: New Music Generation Model

Google Enhances Generative AI Models on Vertex AI
Updates to several of Google’s first-party AI models for media generation were deployed on Wednesday through its Vertex AI cloud platform.
New Capabilities Across Models
Lyria, Google’s text-to-music model, is now accessible in preview to a limited number of customers. Furthermore, the Veo 2 video-creation model has received enhancements, including expanded editing and visual effects customization options.
A voice-cloning feature, powered by Chirp 3 – Google’s audio understanding model – has also been launched for a select group of “allow-listed” users. The Imagen 3 image generator now delivers, according to Google, a “significantly” improved level of performance.
Competition in the Enterprise AI Market
These updates, coinciding with Cloud Next, represent Google’s ongoing effort to establish a leading position in the enterprise generative AI market. The company’s primary competitor in this space is Amazon, which provides a similar cloud AI platform, Bedrock, featuring its own proprietary generative AI models.
Lyria: A Royalty-Free Music Alternative
Google is positioning Lyria as a viable alternative to traditional royalty-free music libraries. The model enables users to compose songs across diverse styles and genres, ranging from jazz piano pieces to lo-fi compositions.
Chirp 3: Voice Cloning and Transcription
Chirp 3 is capable of synthesizing speech in approximately 35 different languages. The Instant Custom Voice feature, initially previewed earlier this year, utilizes Chirp 3 to clone a voice using just 10 seconds of audio and is now generally available.
A new tool, Transcription with Diarization, is also launching in preview, leveraging this model to separate and identify individual speakers within multi-participant recordings.
Safeguards for Voice Cloning
To mitigate potential misuse, Instant Custom Voice undergoes a “diligence” process to confirm “appropriate voice usage permissions,” as stated by Google.
Veo 2: Advanced Video Editing Features
The Veo 2 model now offers the ability to remove backgrounds, logos, and unwanted objects from videos. It can also extend video footage, for example, converting landscape orientation to portrait.
Additionally, users can adjust camera angles and pacing within AI-generated scenes to create effects like time lapses and drone footage. The model can also interpolate between defined start and end frames.
These Veo features are currently available in preview.
Imagen 3: Enhanced Image Manipulation
The upgrades to Imagen 3 improve the model’s capabilities in removing objects and reconstructing missing or damaged areas within images.
Watermarking and Safety Measures
All media generated by Imagen, Veo, and Lyria (excluding Chirp) is watermarked using Google’s SynthID technology. The company asserts that all its generative AI models incorporate “built-in safeguards” to prevent the creation of harmful content.
Data Training Transparency
Google has maintained its practice of not disclosing the specific data used to train its models. Data used for training is often a contentious issue due to intellectual property concerns.
Some companies train their models on copyrighted material without obtaining prior authorization from copyright holders. While these companies invoke the U.S. fair use doctrine, many creators dispute this claim and are pursuing legal action.
Copyright Protection for Users
Google has previously informed TechCrunch that it provides opt-out options for model training and an indemnity policy to protect Google Cloud and Vertex AI customers from potential AI-related copyright disputes.





