Voice Activity Detection (VAD)
is software that is used to detect the presence of human speech in audio. As humans, we are naturally able to distinguish human speech from other sounds, but machines need some help to do the same. Given some audio input, a VAD
makes a binary decision and determines whether the input contains speech or not. This functionality is essential to many speech recognition applications.
Picovoice's Cobra Voice Activity Detection engine
is an on-device and lightweight VAD
software, running on any platform - including web browsers. Cobra VAD
performs voice activity detection locally, keeping your voice data private (i.e. it is GDPR
and HIPAA
-compliant by design).
Importantly, the Cobra Voice Activity Detection engine
is the most accurate VAD
engine across all platforms, even in comparison to Google's widely used WebRTC VAD.
Cobra VAD
is available for all major browsers: Chrome, Safari, Firefox and Edge.
In just a few minutes, you can start detecting voice activity in real time using the Cobra Voice Activity Detection
JavaScript SDK. Let’s get started!
Demo Project
A complete working demo is available on CodePen. Just make sure you replace the ${ACCESS_KEY}
string with your own AccessKey
(see Step 3).
1. Project setup
Create a new folder and initialize an npm project:
Next, install @picovoice/web-voice-processor and @picovoice/cobra-web:
Also install http-server
as a development dependency, so we can view our project on localhost
:
2. HTML
Create an index.html
file with the following scripts:
You'll now be able to run the local server to load the page:
You can see the page at http://localhost:5000
. This will just look like a blank page for now.
3. Picovoice Console
Sign up for a free Picovoice Console account and copy your AccessKey
, found on the main dashboard.
4. Initialize Cobra
In a <script>
tag within the <body>
of the html
file, create an instance of CobraWorker
with your Picovoice AccessKey
and a voiceProbabilityCallback
function.
For each audio frame processed, voiceProbabilityCallback
returns a score from 0 to 1 (voiceProbability
). A score of 1 indicates a 100% probability that the current audio frame
contains voice, and a score of 0 indicates a 0% probability.
In digital audio, an audio frame
refers to a discrete unit of audio data that represents a brief moment in time. These frames are the building blocks of digital audio signals and are used to store, process, and transmit audio information. CobraWorker
receives audio frames
from WebVoiceProcessor
when it gets subscribed to it (see next step).
5. Start Detecting Voice
The Web Audio API and the MediaStream API are commonly used by developers to work with audio in web browsers. Although powerful, setup for the Web Audio and MediaStream APIs can be fairly complex. This is why we created Web Voice Processor - an open-source library that handles recording audio for you.
To start detecting voice, simply subscribe cobra
to WebVoiceProcessor
.
To stop processing audio, unsubscribe cobra
.
6. Complete HTML
Add some html
elements and app logic to see Cobra
in action. It might look something like this:
Finally, go back to http://localhost:5000
. Click "Start Cobra", speak into your mic, and watch the Voice Probability
change based on whether you are speaking or not!
Adding to Existing Project?
If you are working within an existing project that has a module bundler, you can use the import
syntax instead:
For more information, check out the Cobra Voice Activity Detection product page or refer to the Cobra Voice Activity Detection JavaScript SDK quick start guide.