veo 3: AI Video Generation with Soundtracks

Google Unveils Veo 3: An AI Model with Integrated Audio Generation
Google has recently introduced its newest video-generating AI, Veo 3, which possesses the capability to create accompanying audio for the videos it produces.
Veo 3’s Capabilities and Improvements
During the Google I/O 2025 developer conference, Google showcased Veo 3, asserting its ability to generate sound effects, ambient noises, and even spoken dialogue to complement its video output. The company also highlighted improvements in footage quality compared to its predecessor, Veo 2.
Access to Veo 3 began on Tuesday within Google’s Gemini chatbot application. It is available to subscribers of the $249.99 monthly AI Ultra plan, and can be activated using either text prompts or image inputs.
The Significance of Audio Integration
“We are now transitioning beyond the era of silent video generation,” stated Demis Hassabis, CEO of Google DeepMind, Google’s AI research and development division, during a press conference. He explained that users can provide Veo 3 with descriptions of characters and settings, alongside suggested dialogue and desired vocal characteristics.
The proliferation of video generation tools has resulted in a highly competitive market. Numerous startups, including Runway, Lightricks, Genmo, Pika, Higgsfield, Kling, and Luma, alongside tech industry leaders like OpenAI and Alibaba, are rapidly releasing new models.
A key differentiator for Veo 3 is its audio output capability. While AI-powered sound generation and video sound effect creation tools already exist, Veo 3 uniquely analyzes the video’s visual data to automatically synchronize generated sounds with the clips.
Underlying Technology and Training Data
The development of Veo 3 was likely facilitated by DeepMind’s prior research in “video-to-audio” AI. Last June, DeepMind revealed its work on AI technology designed to create video soundtracks by training a model on a combination of audio, dialogue transcripts, and video footage.
DeepMind has not disclosed the specific sources used to train Veo 3, but YouTube is considered a strong possibility. Given Google’s ownership of YouTube, and previous statements by DeepMind to TechCrunch, it’s plausible that YouTube content was utilized in the training process.
Safeguards Against Misuse
To address concerns about deepfakes, DeepMind is implementing its proprietary watermarking technology, SynthID, to embed imperceptible markers within the frames generated by Veo 3.
Impact on the Creative Industries
While Google positions Veo 3 as a powerful creative tool, many artists express apprehension about its potential to disrupt established industries. A 2024 study commissioned by the Animation Guild, representing Hollywood animators and cartoonists, projects that over 100,000 U.S.-based film, television, and animation jobs could be affected by AI by 2026.
Updates to Veo 2
Google also announced new features for Veo 2, including the ability to provide the model with images of characters, scenes, objects, and styles to enhance consistency. The updated Veo 2 can now interpret camera movements such as rotations, dollies, and zooms, and allows users to add or remove elements from videos or adjust the aspect ratio of clips.
These new Veo 2 capabilities will be integrated into Google’s Vertex AI API platform in the coming weeks.
Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature
