Languages, Dialects, and Accents in Voice AI

🏢 Enterprise AI Consulting

Get dedicated help specific to your use case and for your hardware and software choices.

"Does Picovoice technology work across various Accents and Dialects?" is a frequently asked question the Picovoice team gets (it's, in fact, in the FAQs). A few years ago, issues of virtual assistants, such as Siri and Alexa, not recognizing certain Accents were on the news widely. In 2018, the Washington Post published a study, Accent Gap, and the Daily Mail published a video of Alexa struggling with the Scottish Accent. In 2019, Stanford University researchers initiated the Fair Speech study and analyzed racial disparities in automated speech recognition. Before diving into recognizing Languages, Dialects, and Accents in speech recognition, let's define these terms.

What's a language?

Language is a system of communication used by humans to convey meaning. Language differentiates humans from other animals. Although animals have communication systems and have some features in common with the human Language, they are not as complex.

What's a dialect?

A Dialect is a particular form of a language with unique grammar and pronunciation features and lexical differences. The variety of words that refer to the main road is an example of lexical differences. It is called a freeway in Los Angeles, a thruway in New York, a parkway in New Jersey, a motorway in England, and a highway in Canada.

What's an accent?

An Accent is a distinctive way of pronunciation specific to a nation, local group, or social class. Dialects have unique accents. For example, some English Dialects drop the "r" at the end of words. However, not every Accent has a Dialect. Non-native speakers may have a distinctive accent without a Dialect.

Why do machines have trouble recognizing some dialects and accents?

First, it's not just machines that have trouble understanding Dialects and Accents. In this example from the British Parliament, an MP doesn't understand his Scottish counterpart. The MP says it could be due to his antipodean background while asking him to repeat the question. Just like machines, unfamiliar Accents can be challenging for humans, whether they are native speakers or not.

Traditionally, speech recognition providers trained the models based on American English and chose to deal with Dialect and Accent variations with add-ons or additional packages. Amazon and Apple improved speech models with user data even without their consent in the early days of Alexa and Siri. The early adopters of these products were generally white, highly-educated, upper-middle-class Americans from the West Coast. Like the British MP, speech models struggle to understand unfamiliar Accents.

How does voice AI recognize different dialects and accents?

The inclusivity and robustness of voice AI models depend on the training data. If the training dataset is diverse, meaning not limited to a group of people, then the model can recognize different Dialects and Accents. Having global models instead of specific packages for Dialects and Accents requires more work upfront. However, in real-life environments, multiple speakers with different Accents interact, and speakers' Accents are unknown before they start speaking. Thus, speech models should be global and good at recognizing different dialects and accents. Picovoice uses diverse training datasets and creates global models. Yet, the best way to evaluate the robustness of speech models is to try them. Create a Picovoice Console account for free and evaluate Picovoice engines.