Speech Intelligibility represents the percentage of speech that a listener can understand. Various factors, such as the articulation of a speaker or background noises, affect Speech Intelligibility. For example, Speech Intelligibility is 50% when you understand half of what a toddler says. Background noise also lowers Speech Intelligibility, making noise suppression critical for communication applications.

Removing the noise to make speech easier to understand sounds trivial. However, building noise suppression software that works well is challenging. Using Speech Quality and Speech Intelligibility metrics helps to evaluate how well noise suppression software works.

What’s Speech Intelligibility Index (SII)?

SII is a refined version of the Articulation Index (AI). AI, developed in the 1920s, predicts the ratio of audible speech. It focuses on how well people suffering from hearing loss can hear. The AI does not measure the intelligibility of sentences or speech processed by algorithms. Thus, researchers developed SII to cover computational methods and environmental conditions that affect Speech Intelligibility.

What’s Speech Transmission Index (STI)?

STI is an objective measurement of speech transmission quality. It focuses on the physical transmission channels, such as a room or telephone line capability. It’s a numeric representation of communication channel characteristics on a scale of 0 to 1, where 0 corresponds to bad and 1 to excellent.

AI and SII work well in the presence of steady noise but not reverberation. STI is capable of covering reverberation inferences as well. However, none accounts for non-linear distortions and fluctuating noises, such as background talkers or wind.

What’s Short-Time Objective Intelligibility (STOI)?

In 2011, researchers proposed STOI as an alternative to Speech Intelligibility metrics SII and STI. They found STOI was highly correlated with degraded speech signal intelligibility, making it a good fit for measuring the effect of speech enhancement. STOI uses the average correlation to predict the intelligibility scores.

Picovoice’s researchers decided to use STOI as the benchmark to assess the performance of Koala Noise Suppression.

  1. Using subjective metrics, like MOS, is expensive, as they require extensive and diverse sample sizes to get reliable results. More importantly, they are not reproducible. Vendors can easily manipulate the results of MOS-based comparisons by cherrypicking listeners (e.g., employees) or providing them with cues to score their engine higher.
  2. Intelligibility metrics, SII and STI, are objective but do not account for distortion introduced by speech enhancement engines.

Yet, it’s imperative to acknowledge that no measure works well for every noise inference. Hence, there is no standard measurement method used across the industry. Picovoice open-sourced the noise suppression benchmark using the STOI metric and created simple audio clips to help developers to evaluate alternative noise suppression engines.

The best way to measure Speech Quality and Speech Intelligibility is to test noise suppression engines in a real-life environment. However, it's one aspect while evaluating noise suppression engines. Other factors impact the performance and usability as well.

Start Building