LOGO

Speechmatics Improves Accented English Speech Recognition

October 26, 2021
Speechmatics Improves Accented English Speech Recognition

The Evolution and Inclusivity of Speech Recognition Technology

Over the past few years, speech recognition has transitioned from a convenience to a necessity, driven by the proliferation of smart speakers and advanced driver-assistance systems. However, equitable performance across all voices remains a challenge.

Addressing Disparities in Accuracy

Speechmatics asserts that its speech recognition model is the most accurate and inclusive currently available, surpassing competitors like Amazon and Google, particularly when processing speech patterns beyond standard American English.

This focus on accuracy was prompted by a 2019 Stanford University study, “Racial Disparities on Speech Recognition,” which revealed significant discrepancies. The study demonstrated that speech engines from major companies – Amazon, Apple, Google, IBM, and Microsoft – exhibited notably higher word error rates (WER) for Black speakers, averaging 0.35, compared to 0.19 for White speakers.

The Role of Data Diversity

A primary contributor to these disparities is likely the limited diversity within the datasets used to train these speech recognition systems. If a training dataset contains insufficient representation from Black speakers, the model’s ability to accurately interpret their speech patterns will be compromised.

This principle extends to speakers with diverse accents, dialects, and regional variations. Given the linguistic diversity within the United States – and indeed, the United Kingdom – any organization offering speech services to a broad audience must acknowledge and address this complexity.

Speechmatics’ Performance Gains

The U.K.-based company, Speechmatics, prioritized accuracy in transcribing accented English with its latest model, and reports substantial improvements over existing solutions. Utilizing the same datasets employed in the Stanford study, but with updated speech software versions, “Speechmatics achieved an overall accuracy of 82.8% for African American voices, compared to 68.7% for Google and 68.6% for Amazon,” according to a company press release.

A Novel Approach to Model Creation

Speechmatics attributes its success to a relatively recent methodology in speech recognition model development. Traditional machine learning relies on supervised learning, where systems are trained with labeled data – for example, audio files paired with corresponding transcripts created and verified by human annotators.

Speechmatics, however, implemented self-supervised learning, a technique gaining prominence due to advancements in dataset size, learning efficiency, and computational capabilities. This approach incorporates both labeled and vast quantities of raw, unlabeled data, enabling the model to develop a more nuanced understanding of speech with reduced direct guidance.

Leveraging Large-Scale Unlabeled Data

The Speechmatics model began with approximately 30,000 hours of labeled data to establish a foundational understanding. Subsequently, it was exposed to 1.1 million hours of publicly available audio content sourced from platforms like YouTube and podcasts.

The use of this publicly sourced data raises ethical considerations, as explicit consent for its use in commercial speech recognition training was not obtained. However, this practice is becoming increasingly common, mirroring the approach taken by OpenAI in training its GPT-3 model using a substantial portion of the internet’s content.

Expanded Accuracy Across Demographics

Beyond improved accuracy for Black American speakers, the Speechmatics model demonstrates enhanced transcription capabilities for children (approximately 92% accuracy versus 83% for Google and Deepgram) and incremental gains in recognizing English spoken with accents from various regions, including Indian, Filipino, Southern African, and Scottish dialects.

The company also supports numerous other languages and maintains competitive performance in those areas. However, given the widespread use of English as a global language, accurate accent recognition is particularly crucial.

The Future of Inclusive AI

While Speechmatics currently leads in the metrics it reports, the field of artificial intelligence is rapidly evolving. Further advancements and competitive leaps are anticipated in the coming year. Companies like Google are actively working to improve speech recognition for individuals with speech impairments.

Prioritizing inclusion is becoming a central tenet of AI development, and it is encouraging to witness companies striving to surpass one another in this critical area.

#speech recognition#accented english#speechmatics#transcription#AI#voice technology