Speech Intelligibility represents the percentage of speech that a listener can understand. Various factors, such as the articulation of a speaker or background noises, affect
Speech Intelligibility. For example,
Speech Intelligibility is 50% when you understand half of what a toddler says. Background noise also lowers
Speech Intelligibility, making noise suppression critical for communication applications.
Removing the noise to make speech easier to understand sounds trivial. However, building noise suppression software that works well is challenging. Using Speech Quality and
Speech Intelligibility metrics helps to evaluate how well noise suppression software works.
What’s Speech Intelligibility Index (SII)?
SII is a refined version of the
Articulation Index (AI).
AI, developed in the 1920s, predicts the ratio of audible speech. It focuses on how well people suffering from hearing loss can hear. The
AI does not measure the intelligibility of sentences or speech processed by algorithms. Thus, researchers developed
SII to cover computational methods and environmental conditions that affect
What’s Speech Transmission Index (STI)?
STI is an objective measurement of speech transmission quality. It focuses on the physical transmission channels, such as a room or telephone line capability. It’s a numeric representation of communication channel characteristics on a scale of 0 to 1, where 0 corresponds to bad and 1 to excellent.
SII work well in the presence of steady noise but not reverberation. STI is capable of covering reverberation inferences as well. However, none accounts for non-linear distortions and fluctuating noises, such as background talkers or wind.
What’s Short-Time Objective Intelligibility (STOI)?
In 2011, researchers proposed
STOI as an alternative to
Speech Intelligibility metrics
STI. They found
STOI was highly correlated with degraded speech signal intelligibility, making it a good fit for measuring the effect of speech enhancement.
STOI uses the average correlation to predict the intelligibility scores.
Picovoice’s researchers decided to use
STOI as the benchmark to assess the performance of Koala Noise Suppression.
- Using subjective metrics, like MOS, is expensive, as they require extensive and diverse sample sizes to get reliable results. More importantly, they are not reproducible. Vendors can easily manipulate the results of
MOS-based comparisons by cherrypicking listeners (e.g., employees) or providing them with cues to score their engine higher.
- Intelligibility metrics,
STI, are objective but do not account for distortion introduced by speech enhancement engines.
Yet, it’s imperative to acknowledge that no measure works well for every noise inference. Hence, there is no standard measurement method used across the industry. Picovoice open-sourced the noise suppression benchmark using the
STOI metric and created simple audio clips to help developers to evaluate alternative noise suppression engines.
The best way to measure Speech Quality and
Speech Intelligibility is to test noise suppression engines in a real-life environment. However, it's one aspect while evaluating noise suppression engines. Other factors impact the performance and usability as well.