“Does Picovoice technology work across various Accents
and Dialects
?” is a frequently asked question the Picovoice team gets (it’s, in fact, in the FAQs). A few years ago, issues of virtual assistants, such as Siri and Alexa, not recognizing certain Accents
were on the news widely. In 2018, the Washington Post published a study, Accent Gap
, and the Daily Mail published a video of Alexa struggling with the Scottish Accent
. In 2019, Stanford University researchers initiated the Fair Speech
study and analyzed racial disparities in automated speech recognition. Before diving into recognizing Languages
, Dialects
, and Accents
in speech recognition, let’s define these terms.
What’s a language?
Language
is a system of communication used by humans to convey meaning. Language
differentiates humans from other animals. Although animals have communication systems and have some features in common with the human Language
, they are not as complex.
What’s a dialect?
A Dialect
is a particular form of a language with unique grammar and pronunciation features and lexical differences. The variety of words that refer to the main road is an example of lexical differences. It is called a freeway
in Los Angeles, a thruway
in New York, a parkway
in New Jersey, a motorway
in England, and a highway
in Canada.
What’s an accent?
An Accent
is a distinctive way of pronunciation specific to a nation, local group, or social class. Dialects
have unique accents. For example, some English Dialects
drop the “r” at the end of words. However, not every Accent
has a Dialect
. Non-native speakers may have a distinctive accent without a Dialect
.
Why do machines have trouble recognizing some dialects and accents?
First, it’s not just machines that have trouble understanding Dialects
and Accents
. In this example from the British Parliament, an MP doesn’t understand his Scottish counterpart. The MP says it could be due to his antipodean background while asking him to repeat the question. Just like machines, unfamiliar Accents
can be challenging for humans, whether they are native speakers or not.
Traditionally, speech recognition providers trained the models based on American English and chose to deal with Dialect
and Accent
variations with add-ons or additional packages. Amazon and Apple improved speech models with user data even without their consent in the early days of Alexa and Siri. The early adopters of these products were generally white, highly-educated, upper-middle-class Americans from the West Coast. Like the British MP, speech models struggle to understand unfamiliar Accents
.
How does voice AI recognize different dialects and accents?
The inclusivity and robustness of voice AI models depend on the training data. If the training dataset is diverse, meaning not limited to a group of people, then the model can recognize different Dialects
and Accents
. Having global models instead of specific packages for Dialects
and Accents
requires more work upfront. However, in real-life environments, multiple speakers with different Accents
interact, and speakers’ Accents
are unknown before they start speaking. Thus, speech models should be global and good at recognizing different dialects and accents. Picovoice uses diverse training datasets and creates global models. Yet, the best way to evaluate the robustness of speech models is to try them. Picovoice’s Free Plan is ideal to kickstart building and testing.