How to Build a Sound Recognition System

🏢 On-device AI for Enterprises

Get dedicated help specific to your use case and for your hardware and software choices.

Sound Recognition is the technology to understand the audio activity in the surroundings. Sound Detection, Sound Event Detection, Sound Classification, Sound Identification, Sound Analysis, and Audio Classification are the same technology.

Sound Recognition helps with security systems, accessibility, predictive maintenance, smart home, healthcare, and consumer electronics.

A security camera that can hear glass breaking
A baby monitor that detects infant cries
A wearable for law enforcement that can detect gunshots
A hearable that detects sirens and warns while listening to music
An app that alerts the user with hard of hearing when the doorbell rings

There are numerous options for adding Sound Recognition. Outright purchase, open-source, or build are all valid options. This strategic decision is paramount for the long-term success of your project and business unit.

Buy

There is no service in the public cloud with APIs for Sound Recognition. You can find startups, like Audio Analytic, Cochlear, or Edge Impulse, that offer SDKs and APIs. The pricing for these offerings is not openly-available. Expect to go through a B2B sales process, and the quote will be enterprise-grade.

Build

Sound detection is a classic entry-level project for deep learning enthusiasts. It is trivial to get something working and get excited about it. Alas, this is far from the requirements in the field. Here is a real-life example. A publicly-traded security-tech company trained a model to detect gunshots. It worked great, except that the model also (false) triggers when someone closes a door. The incident was a massive monetary loss and brand setback. In short, building a highly-accurate sound detection model is complex and requires investment, expertise, and time.

There are different approaches to building a Sound Recognition system. If there is no randomness in the sound (e.g., a specific iPhone ringtone), you can use a simple algorithm such as Dynamic Time Warping. If you want to detect a single or handful of sounds, and there exist many examples of each class, you can train a classifier directly. If you don't have limited training data, then use Transfer Learning.

Open Source

TensorFlow has a great set of tools and pretrained models for Sound Recognition. The upside is that it is free to use. The downside is that I doubt Google or TensorFlow folks would make any customizations based on your needs. TensorFlow is probably a great starting point if you decide to go down the path of building.

Experts at Picovoice work with Enterprise customers to create private AI algorithms, such as Language Detection or Speech Emotion Detection, specific to their use cases.

Talk to Sales

How to Sound Recognition

Buy

Build

Open Source

More from Picovoice