Detecting human speech within audio can be an essential part of a speech recognition system. While humans can naturally discern speech from other sounds, machines require assistance to make this distinction. Engines designed for this purpose, known as Voice Activity Detectors
(VAD
s), analyze audio input and determine whether it contains speech.
Picovoice's Cobra Voice Activity Detection engine makes it easy to detect voice activity in audio data. Cobra VAD
is lightweight, compatible with any platform, and operates locally, ensuring privacy compliance with GDPR
and HIPAA
regulations.
Notably, Cobra VAD
stands out as the most accurate VAD
engine across all platforms, surpassing even Google's widely-used WebRTC VAD.
In just a few lines of code, you can start detecting voice activity in real time from a microphone using the Cobra Voice Activity Detection Node.js SDK. Let’s get started!
Install Packages
Create a project and install @picovoice/pvrecorder-node and @picovoice/cobra-node.
@picovoice/pvrecorder-node
will be used to record microphone audio@picovoice/cobra-node
will perform the voice activity detection
Sign up for Picovoice Console
Next, create a Picovoice Console account, and copy your AccessKey
from the main dashboard. Creating an account is free, and no credit card is required!
Initialize Cobra
Create an instance of Cobra
with your AccessKey
:
Capture Microphone Audio
Next, we need to pass audio to Cobra
to perform voice activity detection. This audio can be from a microphone or a stream you receive from another source, as long as the audio frames are of a specific frame length
(specified by cobra.frameLength
) and the audio itself is recorded at the required sample rate
(specified by cobra.sampleRate
).
In digital audio, an audio frame
refers to a discrete unit of audio data that represents a brief moment in time. An audio frame
consists of a number of samples
, each of which is a numeric value that represents the amplitude of the sound waveform at a single point in time. The number of samples
in each audio frame
is referred to as its frame length
.
To record audio with the appropriate frame length
, we can use PvRecorder - an audio recorder designed for real-time speech audio processing.
Create an instance of PvRecorder
with cobra.frameLength
, and call pvRecorder.start()
.
To stop recording audio, call pvRecorder.stop()
Detect Voice Activity
Each call to pvRecorder.read()
will return a single audioFrame
that you can then pass to cobra
for processing. Once processed, cobra
will return a voice probability score, which is a floating-point value between 0 and 1.
Putting It All Together
A complete working example can look something like this:
For a complete working project, take a look at the Cobra Voice Activity Detection Node.js Demo. You can also view the Cobra Voice Activity Detection Node.js API docs for details on the package.
Have you seen our other Node.js tutorials? Don’t forget to check out Real-time Transcription with Node.js, Batch Transcription with Node.js, and Speaker Recognition with Node.js.