The dictionary definition of babble is the sound of many people talking simultaneously. The background noise we hear in coffee shops, restaurants, airports, and cocktail parties is babble. While humans handle the babble noise naturally, it’s challenging for machines. It’s, in fact, one of the rigorous problems researchers and engineers tackle while building noise suppression solutions.
Cherry coined the term cocktail party problem in the 1950s and studied how humans understand what one person is saying when others are speaking simultaneously. In a real-world environment, such as a cocktail party, competing voices, like other noises, originate from different locations in space. Thus, each ear receives signals with different intensities and waveforms. We also benefit from the head shadow effect. Our left ear hears the sounds that come from the left better than the right ear. For starters, single-channel noise suppression engines do not benefit from binaural hearing or visual cues like humans.
Humans also use gaps and dips to recognize speech. When there are fewer background speakers, it’s easier to understand them. However, as the number of speakers increases, it gets difficult given the fewer gaps. It also makes babble noise one of the best noise maskers to work in an open office or library . Added noise, i.e., fewer gaps, reduces the intelligibility of human speech and makes it less distracting.
What makes the cocktail party problem hard for machines?
The number of speakers in the babble and the similarity of speech to the desired target speech make the cocktail party problem challenging for machines to tackle. The characteristics of the babble noise constantly change as people carry on conversations. Imagine a cocktail party, people close to you stop talking to take a sip, and a waiter walks around the room. Thus, the noise to suppress constantly changes. Noise suppression engines using traditional signal processing techniques do not perform well in the cases of changing (nonstationary) noises because they look for long-term patterns. Naturally, there are no stable patterns in changing noises. Advances in machine learning in the last few years have proved that deep neural network architectures successfully remove both stationary and nonstationary noises, outperforming the traditional DSP methods significantly. However, not every deep neural network architecture performs equally. Listen to how Koala Noise Suppression and Mozilla RNNoise perform in the presence of babble noise.