Inception AI: New AI Model Emerges from Stealth

Inception's Novel AI Model: A Diffusion-Based Approach
A new company, Inception, headquartered in Palo Alto and founded by Stanford computer science professor Stefano Ermon, has announced the development of an innovative AI model. This model is built upon “diffusion” technology and is termed a diffusion-based large language model, or DLM.
The Landscape of Generative AI
Currently, generative AI models largely fall into two primary categories: large language models (LLMs) and diffusion models. LLMs are predominantly utilized for text creation, while diffusion models, as seen in systems like Midjourney and OpenAI’s Sora, excel in generating images, video, and audio content.
Inception’s model provides the functionalities of conventional LLMs, encompassing code generation and question answering, but boasts notably faster processing speeds and reduced computational expenses, according to the company.
The Core Innovation: Diffusion for Text
Professor Ermon revealed to TechCrunch that his research at Stanford has long focused on applying diffusion models to text. His work stemmed from the observation that traditional LLMs exhibit relatively slow processing speeds when contrasted with diffusion technology.
The sequential nature of LLMs presents a bottleneck; generating each subsequent word necessitates the completion of the preceding ones. “You cannot generate the second word until you’ve generated the first one, and you cannot generate the third one until you generate the first two,” Ermon explained.
Ermon sought to leverage the parallel processing capabilities inherent in diffusion models for text generation. Unlike LLMs, which operate sequentially, diffusion models begin with a preliminary approximation of the desired output—such as an image—and refine it simultaneously.
The hypothesis was that diffusion models could facilitate the parallel generation and modification of substantial text blocks. After extensive research, Ermon and a student achieved a significant breakthrough, documented in a research paper released last year.
From Research to Company
Recognizing the potential of this advancement, Ermon established Inception last summer. He enlisted two former students, Aditya Grover from UCLA and Volodymyr Kuleshov from Cornell, as co-leaders of the company.
While specific funding details remain undisclosed, TechCrunch reports that the Mayfield Fund has made an investment in Inception.
Early Adoption and Performance Claims
Inception has already secured contracts with several clients, including Fortune 100 companies, addressing their need for lower AI latency and increased processing speed, as stated by Ermon.
“What we found is that our models can leverage the GPUs much more efficiently,” Ermon noted, referring to the computer chips commonly used for running AI models. “I think this is a big deal. This is going to change the way people build language models.”
Deployment and Capabilities
Inception provides an API, alongside on-premises and edge device deployment options. They also offer support for model fine-tuning and a range of pre-built DLMs tailored for diverse applications.
The company asserts that its DLMs can operate up to 10 times faster than traditional LLMs, while simultaneously reducing costs by a factor of 10.
According to a company spokesperson, “Our ‘small’ coding model is as good as [OpenAI’s] GPT-4o mini while more than 10 times as fast.” Furthermore, “Our ‘mini’ model outperforms small open-source models like [Meta’s] Llama 3.1 8B and achieves more than 1,000 tokens per second.”
The term “tokens” refers to units of raw data. A processing speed of 1,000 tokens per second is remarkably fast, contingent upon the validity of Inception’s claims.