Cochlear.ai Raises $2M Series A for Speech Recognition Enhancement

Take a few moments to listen to the sounds surrounding you. You may notice beeping devices, vehicle horns, a barking dog, or someone’s sneeze. These are the types of sounds that Cochlear.ai, a sound recognition company based in Seoul, is working to identify with its Software-as-a-Service (SaaS) platform. According to co-founder and CEO Yoonchang Han, the company aims to create software capable of recognizing a vast array of sounds for integration into various smart devices, such as mobile phones, speakers, and automobiles, as reported by TechCrunch.

Cochlear.ai recently announced the completion of a $2 million Series A funding round, spearheaded by Smilegate Investment, with additional investment from Shinhan Capital and NAU IB Capital. This latest funding increases the company’s total raised capital to $2.7 million, including prior seed funding from Kakao Ventures, the investment division of the prominent South Korean internet company. Cochlear.ai intends to utilize the Series A funds to expand its team over the next year and a half and to broaden the sound data used for training its deep learning algorithms.

The company was established in 2017 by a team of six researchers specializing in music and audio, including Han, who earned a PhD in music information retrieval from Seoul National University. During his doctoral studies, Han observed “a strong focus on speech recognition systems. Numerous companies were already working in that area, but the technical challenges of analyzing other sound types are significantly different from speech recognition.”

Typically, speech recognition systems are designed to process one or two voices simultaneously, assuming a conversational exchange rather than overlapping speech. These systems also leverage linguistic data during post-processing to enhance accuracy. However, music and environmental sounds often occur concurrently.

“We must account for the entire spectrum of frequencies, and there’s a tremendous variety of sounds beyond just voices,” Han explained. “We believe this represents the future of sound recognition, and that conviction drove the creation of our startup.”

Cochlear.ai’s SaaS offering, known as Cochl.Sense, is available as both a cloud API and an edge SDK, and currently recognizes approximately 40 distinct sounds categorized into three groups: emergency detection (such as breaking glass, screams, and sirens), human interaction (utilizing sounds like finger snaps, claps, or whistles for device control), and human status (identifying sounds like coughing, sneezing, or snoring for applications like patient monitoring or automated audio captioning).

Han also mentioned the company’s plans to introduce new features to Cochl.Sense for use in residential settings (including integration with smart speakers), automobiles, and music analysis. The adaptability of Cochl.Sense allows for a wide range of potential applications, such as transforming a smart speaker into a central control hub for household appliances by recognizing their operational sounds, or assisting individuals with hearing impairments by delivering alerts about important noises, like car horns, to wearable technology like smartwatches.

The sound recognition landscape

Han observes a recent trend over the last few years involving a move beyond speech recognition towards the identification of a broader range of sounds.

Notably, several leading technology firms, including Amazon, Google, and Apple, are integrating sound recognition capabilities that respond to environmental context into their products. As an illustration, Amazon Alexa Guard and Nest Secure can both identify the sound of shattering glass, and the release of iOS 14 introduced sound recognition to facilitate new accessibility options.

Han believes these developments from major technology companies are advantageous for Cochlear.ai, as they indicate growth within the sound recognition technology market. The company intends to collaborate across various sectors, but is presently prioritizing smart consumer devices and the automotive industry, where demand for its software is currently strongest. For instance, Cochlear.ai is collaborating with Daimler AG to integrate its sound recognition technology into vehicles – for example, to provide alerts if a child is accidentally left inside – alongside partnerships with prominent companies in electronics, telecommunications, and consumer goods.

While software capable of recognizing sounds such as gunshots, breaking glass, and other noises for emergency situations has existed for some time, traditional technologies frequently suffered from inaccurate detections or necessitated specialized microphones and hardware, according to Han.

Other organizations focused on advancing sound recognition technology include Audio Analytic, based in Cambridge, England, which specializes in contextually aware sound intelligence, and Sound Intelligence, located in the Netherlands, which creates software for emergency notification and healthcare applications.

Cochlear.ai aims to distinguish itself by developing software compatible with a diverse selection of microphones, including those found in lower-cost smartphones or USB devices, without requiring extensive calibration. Instead, the company leverages deep learning to refine its algorithms and minimize false alarms.

In the initial phases of developing a dataset for a particular sound, the Cochlear.ai team personally records numerous audio examples using older smartphone models and USB microphones, ensuring functionality even with less sophisticated recording equipment.

Additional samples are sourced from publicly available online resources. Once the sound’s preliminary learning model achieves a certain degree of precision, it can independently search for further audio clips of the same type online, significantly accelerating the data training process. Cochlear.ai’s Series A funding will allow it to build audio sample datasets more efficiently, enabling the addition of a wider variety of sounds to its software.

“Our entire founding team consists of researchers in this area, so we are exploring numerous signal processing and machine learning methods – we are experimenting with many different algorithms, as each sound possesses unique characteristics,” Han explained. “We must test a variety of approaches to create a single model capable of identifying a comprehensive range of sounds.”

Edit: This story has been updated with the correct spelling of Audio Analytic.