JavaScript Voice Activity Detection

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI apps running within web browsers without sending user data to 3rd party servers.

Voice Activity Detection (VAD) is software that is used to detect the presence of human speech in audio. As humans, we are naturally able to distinguish human speech from other sounds, but machines need some help to do the same. Given some audio input, a VAD makes a binary decision and determines whether the input contains speech or not. This functionality is essential to many speech recognition applications.

Picovoice's Cobra Voice Activity Detection engine is an on-device and lightweight VAD software, running on any platform - including web browsers. Cobra VAD performs voice activity detection locally, keeping your voice data private (i.e. it is GDPR and HIPAA-compliant by design).

Importantly, the Cobra Voice Activity Detection engine is the most accurate VAD engine across all platforms, even in comparison to Google's widely used WebRTC VAD.

Cobra VAD is available for all major browsers: Chrome, Safari, Firefox and Edge.

In just a few minutes, you can start detecting voice activity in real time using the Cobra Voice Activity Detection JavaScript SDK. Let’s get started!

Demo Project

A complete working demo is available on CodePen. Just make sure you replace the ${ACCESS_KEY} string with your own AccessKey (see Step 3).

1. Project setup

Create a new folder and initialize an npm project:

npm init -y

Next, install @picovoice/web-voice-processor and @picovoice/cobra-web:

npm install @picovoice/web-voice-processor @picovoice/cobra-web

Also install http-server as a development dependency, so we can view our project on localhost:

npm install http-server --save-dev

2. HTML

Create an index.html file with the following scripts:

<!DOCTYPE html>
<html>
  <head>
    <script src="node_modules/@picovoice/cobra-web/dist/iife/index.js"></script>
    <script src="node_modules/@picovoice/web-voice-processor/dist/iife/index.js"></script>
  </head>
  <body>
  </body>
</html>

You'll now be able to run the local server to load the page:

yarn run http-server -a localhost -p 5000

You can see the page at http://localhost:5000. This will just look like a blank page for now.

3. Picovoice Console

4. Initialize Cobra

In a <script> tag within the <body> of the html file, create an instance of CobraWorker with your Picovoice AccessKey and a voiceProbabilityCallback function.

<!--...-->
<body>
  <script type="application/javascript">
    let cobra = null
  
    async function initCobra() {
      cobra = await CobraWeb.CobraWorker.create(
        "${ACCESS_KEY}", // Replace with your Picovoice AccessKey
        voiceProbabilityCallback
      )
    }
  
    function voiceProbabilityCallback(voiceProbability) {
      // use voice probability value
    }
  </script>
</body>
<!--...-->

For each audio frame processed, voiceProbabilityCallback returns a score from 0 to 1 (voiceProbability). A score of 1 indicates a 100% probability that the current audio frame contains voice, and a score of 0 indicates a 0% probability.

In digital audio, an audio frame refers to a discrete unit of audio data that represents a brief moment in time. These frames are the building blocks of digital audio signals and are used to store, process, and transmit audio information. CobraWorker receives audio frames from WebVoiceProcessor when it gets subscribed to it (see next step).

5. Start Detecting Voice

The Web Audio API and the MediaStream API are commonly used by developers to work with audio in web browsers. Although powerful, setup for the Web Audio and MediaStream APIs can be fairly complex. This is why we created Web Voice Processor - an open-source library that handles recording audio for you.

To start detecting voice, simply subscribe cobra to WebVoiceProcessor.

async function startCobra() {
  await WebVoiceProcessor.WebVoiceProcessor.subscribe(cobra)
}

To stop processing audio, unsubscribe cobra.

async function stopCobra() {
  await WebVoiceProcessor.WebVoiceProcessor.unsubscribe(cobra)
}

6. Complete HTML

Add some html elements and app logic to see Cobra in action. It might look something like this:

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>Cobra Voice Activity Detection - Picovoice</title>
    <script
      src="node_modules/@picovoice/cobra-web/dist/iife/index.js">
    </script>
    <script
      src="node_modules/@picovoice/web-voice-processor/dist/iife/index.js">
    </script>
  </head>
  <body>
    <div>Voice Probability: <span id="voice-probability"></span></div>
    <button id="start-cobra">Start Cobra</button>
    <button id="stop-cobra">Stop Cobra</button>
    <script type="application/javascript">
      const voiceProbabilitySpan = document.getElementById('voice-probability')
      const startCobraButton = document.getElementById('start-cobra')
      const stopCobraButton = document.getElementById('stop-cobra')
      startCobraButton.addEventListener('click', startCobra)
      stopCobraButton.addEventListener('click', stopCobra)
      
      let cobra = null
    
      async function initCobra() {
        cobra = await CobraWeb.CobraWorker.create(
          "${ACCESS_KEY}", // Replace with your Picovoice AccessKey
          voiceProbabilityCallback
        )
      }
    
      function voiceProbabilityCallback(voiceProbability) {
        voiceProbabilitySpan.innerText = voiceProbability
      }
  
      async function startCobra() {
        if (!cobra) {
          await initCobra()
        }
        
        await WebVoiceProcessor.WebVoiceProcessor.subscribe(cobra)
      }
  
      async function stopCobra() {
        await WebVoiceProcessor.WebVoiceProcessor.unsubscribe(cobra)
      }
    </script>
  </body>
</html>

Finally, go back to http://localhost:5000. Click "Start Cobra", speak into your mic, and watch the Voice Probability change based on whether you are speaking or not!

Adding to Existing Project?

If you are working within an existing project that has a module bundler, you can use the import syntax instead:

import { CobraWorker } from "@picovoice/cobra-web"
import { WebVoiceProcessor } from "@picovoice/web-voice-processor"

// ...

// Change "CobraWeb.CobraWorker." to "CobraWorker."
await CobraWorker.create(
  "${ACCESS_KEY}", // Replace with your Picovoice AccessKey
  voiceProbabilityCallback
)

// Change "WebVoiceProcessor.WebVoiceProcessor." to "WebVoiceProcessor."
await WebVoiceProcessor.subscribe(cobra)
await WebVoiceProcessor.unsubscribe(cobra)

For more information, check out the Cobra Voice Activity Detection product page or refer to the Cobra Voice Activity Detection JavaScript SDK quick start guide.

Real-Time Voice Activity Detection in JavaScript