Voice Activity Detection (VAD) is software that is used to detect the presence of human speech in audio. As humans, we are naturally able to distinguish human speech from other sounds, but machines need some help to do the same. Given some audio input, a VAD makes a binary decision and determines whether the input contains speech or not. This functionality is essential to many speech recognition applications.

Picovoice's Cobra Voice Activity Detection engine is an on-device and lightweight VAD software, running on any platform - including web browsers. Cobra VAD performs voice activity detection locally, keeping your voice data private (i.e. it is GDPR and HIPAA-compliant by design).

Importantly, the Cobra Voice Activity Detection engine is the most accurate VAD engine across all platforms, even in comparison to Google's widely used WebRTC VAD.

Cobra VAD is available for all major browsers: Chrome, Safari, Firefox and Edge.

In just a few minutes, you can start detecting voice activity in real time using the Cobra Voice Activity Detection JavaScript SDK. Let’s get started!

Demo Project

A complete working demo is available on CodePen. Just make sure you replace the ${ACCESS_KEY} string with your own AccessKey (see Step 3).

1. Project setup

Create a new folder and initialize an npm project:

Next, install @picovoice/web-voice-processor and @picovoice/cobra-web:

Also install http-server as a development dependency, so we can view our project on localhost:

2. HTML

Create an index.html file with the following scripts:

You'll now be able to run the local server to load the page:

You can see the page at http://localhost:5000. This will just look like a blank page for now.

3. Picovoice Console

Sign up for a free Picovoice Console account and copy your AccessKey, found on the main dashboard.

4. Initialize Cobra

In a <script> tag within the <body> of the html file, create an instance of CobraWorker with your Picovoice AccessKey and a voiceProbabilityCallback function.

For each audio frame processed, voiceProbabilityCallback returns a score from 0 to 1 (voiceProbability). A score of 1 indicates a 100% probability that the current audio frame contains voice, and a score of 0 indicates a 0% probability.

In digital audio, an audio frame refers to a discrete unit of audio data that represents a brief moment in time. These frames are the building blocks of digital audio signals and are used to store, process, and transmit audio information. CobraWorker receives audio frames from WebVoiceProcessor when it gets subscribed to it (see next step).

5. Start Detecting Voice

The Web Audio API and the MediaStream API are commonly used by developers to work with audio in web browsers. Although powerful, setup for the Web Audio and MediaStream APIs can be fairly complex. This is why we created Web Voice Processor - an open-source library that handles recording audio for you.

To start detecting voice, simply subscribe cobra to WebVoiceProcessor.

To stop processing audio, unsubscribe cobra.

6. Complete HTML

Add some html elements and app logic to see Cobra in action. It might look something like this:

Finally, go back to http://localhost:5000. Click "Start Cobra", speak into your mic, and watch the Voice Probability change based on whether you are speaking or not!

Adding to Existing Project?

If you are working within an existing project that has a module bundler, you can use the import syntax instead:


For more information, check out the Cobra Voice Activity Detection product page or refer to the Cobra Voice Activity Detection JavaScript SDK quick start guide.