LOGO

NVIDIA World Models: New AI Foundation Models Released

January 7, 2025
NVIDIA World Models: New AI Foundation Models Released

Nvidia Enters the Realm of World Models

Nvidia is now actively developing world models, a type of AI inspired by the way humans naturally form internal representations of the world around them.

Introducing Cosmos World Foundation Models

At CES 2025 in Las Vegas, the company unveiled a series of openly accessible world models capable of predicting and generating “physics-aware” videos. This family of models is designated as Cosmos World Foundation Models, or Cosmos WFMs.

These models are designed to be adaptable through fine-tuning for specific applications and are available via Nvidia’s API and NGC catalogs, GitHub, and the Hugging Face AI development platform.

Availability and Licensing

Nvidia is releasing an initial set of Cosmos WFMs focused on physics-based simulation and synthetic data creation. Researchers and developers, irrespective of organizational size, can utilize these models commercially under Nvidia’s permissive open model license.

Cosmos WFM Model Categories

The Cosmos WFM family comprises three distinct categories:

  • Nano: Optimized for low latency and real-time applications.
  • Super: Providing a “highly performant baseline” for various tasks.
  • Ultra: Delivering maximum quality and fidelity in outputs.

Model sizes vary from 4 billion to 14 billion parameters, with Nano being the smallest and Ultra the largest. Generally, a greater number of parameters correlates with enhanced problem-solving capabilities.

Additional Components of Cosmos WFM

Alongside the core WFMs, Nvidia is also releasing an upsampling model, a video decoder tailored for augmented reality, and guardrail models to promote responsible AI usage.

Furthermore, fine-tuned models for generating sensor data for autonomous vehicle development are included. These models were trained on an extensive dataset of 9,000 trillion tokens, representing 20 million hours of real-world human interactions, environmental data, industrial processes, robotics, and driving scenarios.

Data Source Concerns and Nvidia’s Response

The origin of this training data remains undisclosed by Nvidia. However, reports and a pending lawsuit suggest the company may have utilized copyrighted YouTube videos without obtaining necessary permissions.

In response to inquiries, an Nvidia spokesperson stated that Cosmos is “not designed to copy or infringe any protected works.”

The spokesperson further explained that Cosmos learns in a manner analogous to human learning, utilizing data from diverse public and private sources, and that Nvidia believes its data usage aligns with legal standards. They asserted that factual information about the world, which Cosmos models learn, is not subject to copyright.

Legal Considerations Regarding AI Training Data

Despite Nvidia’s claims, copyright experts suggest that arguments based on fair use doctrine may not withstand legal challenges. The outcome will likely depend on how courts interpret fair use principles in the context of AI training, particularly regarding transformative use.

Capabilities of Cosmos WFM

Nvidia asserts that Cosmos WFM models can generate “controllable, high-quality” synthetic data from text or video inputs, facilitating the training of models for robotics, self-driving cars, and other applications.

Developers can customize the WFMs with their own datasets, such as video recordings from autonomous vehicle trips or robots operating in warehouse environments.

Cosmos WFMs are specifically designed for physical AI research and development, capable of generating physics-based videos from various inputs, including text, images, video, robot sensor data, and motion data.

Industry Adoption and Partnerships

Several companies, including Waabi, Wayve, Foretellix, and Uber, have already expressed commitment to piloting Cosmos WFMs for applications ranging from video search and curation to the development of AI for autonomous vehicles.

“Generative AI will power the future of mobility, requiring both rich data and very powerful compute,” stated Uber CEO Dara Khosrowshahi. “By working with Nvidia, we are confident that we can help supercharge the timeline for safe and scalable autonomous driving solutions for the industry.”

Defining “Open” vs. “Open Source”

It’s important to note that Nvidia’s world models are not strictly “open source.” True open-source AI requires comprehensive documentation of the model’s design, enabling substantial recreation and full disclosure of training data provenance and licensing details.

Nvidia has not released complete training data details or the tools necessary to rebuild the models from scratch, leading the company to characterize them as “open” rather than open source.

Looking Ahead

“We really hope [Cosmos will] do for the world of robotics and industrial AI what Llama has done for enterprise,” remarked Nvidia CEO Jensen Huang during a press event.

#NVIDIA#world models#AI#artificial intelligence#foundation models#generative AI