waveone aims to make video ai-native and turn streaming upside down

Video technology has remained fundamentally consistent for an extended period. Due to its distinct characteristics, video has largely avoided the widespread disruption caused by machine learning across numerous industries. WaveOne aims to alter this situation by reimagining decades-old video codec principles with artificial intelligence – all while circumventing the common challenges faced by aspiring codec innovators and “AI-powered” companies.
Until recently, the startup primarily showcased its findings through research papers and presentations. However, following a recently secured $6.5M seed investment, they are now prepared to transition towards testing and implementing their actual product. This is a significant area: video compression, while seemingly specialized, has become a crucial process for the contemporary internet.
The established method, dating back to the early days of digital video, involves developers creating a standard algorithm for compressing and decompressing video, known as a codec. This codec can be readily distributed and executed on standard computing systems. Examples include MPEG-2, H.264, and similar technologies. Content providers and servers handle the demanding task of compressing video, while the comparatively simpler process of decompression occurs on end-user devices.
This system is remarkably effective, and advancements in codecs – enabling more efficient compression – have facilitated platforms like YouTube. Without these improvements, YouTube’s initial launch would have been unfeasible given the potential video file sizes. A further key development was the increasing reliance on hardware acceleration for these codecs – computers or GPUs may incorporate dedicated chips to perform decompression tasks at a significantly faster rate than a standard CPU in a mobile phone. However, a challenge exists: adopting a new codec typically necessitates new hardware.
However, consider that many modern smartphones now include chips specifically designed for running machine learning models. These chips, like codecs, can be accelerated, but unlike them, the hardware isn’t specifically created for a single model. Therefore, why aren’t we utilizing these ML-optimized chips for video processing? That is precisely the goal of WaveOne.
Initially, I engaged with WaveOne’s co-founders, CEO Lubomir Bourdev and CTO Oren Rippel, with a degree of skepticism, despite their impressive credentials. We have observed numerous codec companies emerge and disappear, while the technology sector has largely converged around a limited number of formats and standards that are updated at a deliberate pace. For example, H.265 was introduced in 2013, yet its predecessor, H.264, remained dominant for years afterward. This evolution resembles the progression from 3G to 4G to 5G, rather than incremental version updates like 7.0 to 7.1. Consequently, smaller alternatives, even those that are superior and open-source, often struggle against the established industry standards.
This historical pattern for codecs, combined with the tendency of startups to label almost anything as “AI-powered,” led me to anticipate something at best misguided, and at worst, deceptive. However, I was pleasantly surprised: WaveOne represents a concept that appears obvious in hindsight and possesses a potential first-mover advantage.
Rippel and Bourdev initially emphasized the genuine role of AI in this context. While codecs like H.265 are sophisticated, they lack true intelligence. They can determine where to allocate more bits for encoding color or detail in a general sense, but they cannot, for instance, identify a face in the scene that deserves prioritized attention, or recognize elements like signs or trees that could be compressed more efficiently.
However, face and scene detection are largely resolved problems in computer vision. Shouldn’t a video codec be able to recognize a face and then dedicate appropriate resources to it? This is a valid question. The answer is that existing codecs lack the necessary flexibility. They are not designed to accept such input. This capability might be incorporated into H.266, eventually, and then supported on high-end devices a couple of years later.
So, how can this be achieved now? By developing a video compression and decompression algorithm that operates on AI accelerators found in many current and future phones and computers, and by integrating scene and object detection from the outset. Similar to how Krisp.ai identifies and isolates voices without complex spectrum analysis, AI can rapidly analyze visual data and transmit that information to the video compression process.
Image Credits: WaveOneThis variable and intelligent allocation of data enables highly efficient compression without compromising image quality. WaveOne asserts that it can reduce file sizes by up to 50%, with even greater improvements in complex scenes. When videos are served millions of times, or to a million users simultaneously, even small percentage gains become significant. While bandwidth costs have decreased, it remains a non-negligible expense.
Understanding the image content (or receiving that information) also allows the codec to adapt to the specific type of content. A video call should prioritize faces, while a game streamer might focus on fine details, and animation requires a different approach to minimize artifacts in large, uniform color areas. All of this can be accomplished dynamically with an AI-powered compression scheme.
The implications extend beyond consumer technology. A self-driving car, transmitting video between components or to a central server, could save time and enhance video quality by concentrating on elements deemed important by the autonomous system – vehicles, pedestrians, animals – and avoiding unnecessary data transmission for features like a clear sky or distant trees.
Content-aware encoding and decoding is likely the most readily understandable and versatile benefit WaveOne proposes. Bourdev also highlighted the method’s increased resilience to bandwidth fluctuations. A common weakness of traditional video codecs is that losing a few bits can disrupt the entire operation, resulting in frozen frames and glitches. However, ML-based decoding can intelligently estimate missing data, preventing freezing and maintaining a slightly reduced level of detail during bandwidth restrictions.
Example of different codecs compressing the same frame.These advantages appear promising, but the crucial question remains: can these improvements be scaled?
“The history of codec development is filled with unsuccessful attempts to create innovative solutions,” Bourdev acknowledged. “A significant factor is hardware acceleration; even if you develop the best codec, it will struggle without a dedicated hardware accelerator. You need not only superior algorithms but also the ability to execute them efficiently across a wide range of devices, both on the edge and in the cloud.”
This is why the specialized AI cores in the latest generation of devices are so important. This represents adaptable hardware acceleration that can be reconfigured in milliseconds for new purposes. WaveOne has been developing video-focused machine learning for years, designed to run on these cores, performing the tasks that H.26X accelerators have handled for years, but with greater speed and flexibility.
Of course, the issue of “standards” remains. Is it realistic to expect widespread adoption of a single company’s proprietary video compression methods? Well, someone must initiate the process! Standards do not originate from immutable sources. As Bourdev and Rippel explained, they are actually utilizing standards – but in a different manner than traditionally understood.
Previously, a “standard” in video meant adhering to a strictly defined software method to ensure compatibility and efficient operation between apps and devices. However, that is not the only definition of a standard. Instead of being a comprehensive method, WaveOne is an implementation that complies with standards on the ML and deployment side.
They are designing the platform to be compatible with major ML distribution and development frameworks like TensorFlow, ONNX, Apple’s CoreML, and others. Meanwhile, the models developed for encoding and decoding video will function like any other accelerated software on edge or cloud devices: deploy it on AWS or Azure, run it locally with ARM or Intel compute modules, and so on.
WaveOne appears to be addressing all the key elements of a significant b2b opportunity: it enhances customer experiences imperceptibly, operates on existing or forthcoming hardware without modification, delivers immediate cost savings (potentially), and offers opportunities for further investment and value creation.
This may explain their success in attracting a substantial seed round: $6.5 million, led by Khosla Ventures, with $1 million each from Vela Partners and Incubate Fund, plus $650,000 from Omega Venture Partners and $350,000 from Blue Ivy.
Currently, WaveOne is in a pre-alpha phase, having demonstrated the technology’s viability but not yet developed a fully functional product. The seed funding, Rippel stated, is intended to mitigate the technology’s risks. While substantial R&D remains, they have proven the core offering’s functionality – building the infrastructure and API layers is the next step, representing a distinct phase for the company. Nevertheless, they aim to conduct testing and secure initial customers before seeking additional funding.
The future of the video industry may diverge significantly from the past few decades, and that could be a positive development. We can anticipate further updates from WaveOne as it transitions from the laboratory to a commercial product.