Speech Intelligibility
represents the percentage of speech that a listener can understand. Various factors, such as the articulation of a speaker or background noises, affect Speech Intelligibility
. For example, Speech Intelligibility
is 50% when you understand half of what a toddler says. Background noise also lowers Speech Intelligibility
, making noise suppression critical for communication applications.
Removing the noise to make speech easier to understand sounds trivial. However, building noise suppression software that works well is challenging. Using Speech Quality and Speech Intelligibility
metrics helps to evaluate how well noise suppression software works.
What’s Speech Intelligibility Index (SII)?
SII
is a refined version of the Articulation Index (AI)
. AI
, developed in the 1920s, predicts the ratio of audible speech. It focuses on how well people suffering from hearing loss can hear. The AI
does not measure the intelligibility of sentences or speech processed by algorithms. Thus, researchers developed SII
to cover computational methods and environmental conditions that affect Speech Intelligibility
.
What’s Speech Transmission Index (STI)?
STI
is an objective measurement of speech transmission quality. It focuses on the physical transmission channels, such as a room or telephone line capability. It’s a numeric representation of communication channel characteristics on a scale of 0 to 1, where 0 corresponds to bad and 1 to excellent.
AI
and SII
work well in the presence of steady noise but not reverberation. STI is capable of covering reverberation inferences as well. However, none accounts for non-linear distortions and fluctuating noises, such as background talkers or wind.
What’s Short-Time Objective Intelligibility (STOI)?
In 2011, researchers proposed STOI
as an alternative to Speech Intelligibility
metrics SII
and STI
. They found STOI
was highly correlated with degraded speech signal intelligibility, making it a good fit for measuring the effect of speech enhancement. STOI
uses the average correlation to predict the intelligibility scores.
Picovoice’s researchers decided to use STOI
as the benchmark to assess the performance of Koala Noise Suppression.
- Using subjective metrics, like MOS, is expensive, as they require extensive and diverse sample sizes to get reliable results. More importantly, they are not reproducible. Vendors can easily manipulate the results of
MOS
-based comparisons by cherrypicking listeners (e.g., employees) or providing them with cues to score their engine higher. - Intelligibility metrics,
SII
andSTI
, are objective but do not account for distortion introduced by speech enhancement engines.
Yet, it’s imperative to acknowledge that no measure works well for every noise inference. Hence, there is no standard measurement method used across the industry. Picovoice open-sourced the noise suppression benchmark using the STOI
metric and created simple audio clips to help developers to evaluate alternative noise suppression engines.
The best way to measure Speech Quality and Speech Intelligibility
is to test noise suppression engines in a real-life environment. However, it's one aspect while evaluating noise suppression engines. Other factors impact the performance and usability as well.