headroom Raises $5M to Enhance Videoconferencing with AI

Video conferencing has become essential for many professionals today, to the extent that a prominent platform, Zoom, has become a commonly used verb.
However, is video conferencing performing to its full potential? A new company, Headroom, is emerging with a suite of AI-powered tools – including computer vision and natural language processing – believing the answer is definitively “no,” even with a stable internet connection.
Headroom provides not only video conferencing capabilities, but also offers features like transcripts, summaries with key highlights, gesture recognition, and enhanced video quality. The company is announcing a $5 million seed funding round as it prepares to launch its freemium service.
Interested individuals can join the waitlist for early access and updates here.
The investment comes from Anna Patterson of Gradient Ventures (Google’s AI venture fund); Evan Nisselson of LDV Capital (a venture capital firm specializing in visual technologies); Jerry Yang, founder of Yahoo and now of AME Cloud Ventures; Ash Patel of Morado Ventures; Anthony Goldbloom, co-founder and CEO of Kaggle.com; and Serge Belongie, associate dean and professor of Computer Vision and Machine Learning at Cornell Tech.
This group of investors is notable, reflecting the founders’ extensive experience in developing advanced visual technologies for both consumer and business applications.
Julian Green, originally from Britain, recently held a position at Google, leading the company’s computer vision products, including the Cloud Vision API launched during his tenure. He joined Google through the acquisition of his previous venture, Jetpac, which utilized deep learning and AI to analyze images for travel recommendations. Previously, he was a co-founder of Houzz, a platform centered around visual interaction.
Andrew Rabinovich, who was born in Russia, spent the last five years at Magic Leap as the head of AI, and prior to that, served as the director of deep learning and head of engineering. Before Magic Leap, he was a software engineer at Google, specializing in computer vision and machine learning.
Some might assume that leaving their positions to create an improved video conferencing service was a reaction to the increased demand this year. However, Green explains that they conceived the idea and began development in late 2019, before the emergence of “COVID-19.”
“It has certainly made this a more compelling area,” he noted, adding that it also facilitated fundraising. (The funding round concluded in July, he stated.)
Considering Magic Leap’s challenges – the difficulties in establishing viable businesses in AR and VR, even with substantial venture capital – and Google’s tendency to consolidate technology development in Mountain View, it’s intriguing that the pair chose to independently build Headroom instead of proposing the technology to their former employers.
Green explained this decision was based on two factors. First, he values the agility of a smaller organization. “I enjoy operating at startup speed,” he said.
Second, he highlighted the advantages of building from scratch versus working within established systems.
“Google is capable of achieving anything,” he responded when asked why he didn’t consider presenting these concepts to the Meet team (or Hangouts for non-business users). “However, to implement real-time AI in video conferencing, it’s necessary to design for it from the outset. That was our initial approach,” he explained.
Despite this, the factors that make Headroom appealing also present significant hurdles. While increased remote work may make users more receptive to video conferencing, many have become accustomed to their current platforms. Furthermore, numerous companies have already invested in premium subscriptions to existing services and may be hesitant to adopt new, unproven alternatives.
However, as demonstrated in the technology sector, being a later entrant doesn’t always preclude success; the first movers aren’t always the ultimate winners.
The initial version of Headroom will include features such as automatic transcriptions of conversations, with video replay enabling transcript editing; summaries of key discussion points; and gesture recognition to facilitate smoother interactions.
Green also mentioned that they are already developing features for future releases. When videoconferences incorporate presentation materials, the engine will also process these for highlights and transcription.
Additionally, a feature will optimize video quality by selectively transmitting pixels, particularly beneficial for users with limited bandwidth.
“The system can identify and transmit only the pixels that are changing in a video conference, as much of the background remains static,” he clarified. “This reduces the amount of data that needs to be sent.”
These capabilities leverage the power of advanced computer vision and natural language algorithms. For example, creating a summary requires technology that can identify not only the content of a conversation but also its most important elements.
And for those who have struggled to interject in video calls without interrupting, gesture recognition could prove invaluable.
This technology can also benefit speakers by detecting audience engagement levels. The same system used to identify gestures indicating a desire to speak can also recognize signs of boredom or disinterest, providing feedback to the presenter.
“It’s about enhancing emotional intelligence,” he said, perhaps with a touch of humor, while on a Google Meet call, which might have been misinterpreted.
This leads to a significant opportunity for Headroom. At their best, these tools can not only enhance video conferences but also address challenges encountered in face-to-face meetings. Developing software that surpasses the “real thing” could ensure its relevance beyond the current circumstances (which hopefully won’t be permanent).
Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature
