Detecting human speech within audio can be an essential part of a speech recognition system. While humans can naturally discern speech from other sounds, machines require assistance to make this distinction. Engines designed for this purpose, known as Voice Activity Detectors (VADs), analyze audio input and determine whether it contains speech.

Picovoice's Cobra Voice Activity Detection engine makes it easy to detect voice activity in audio data. Cobra VAD is lightweight, compatible with any platform, and operates locally, ensuring privacy compliance with GDPR and HIPAA regulations.

Notably, Cobra VAD stands out as the most accurate VAD engine across all platforms, surpassing even Google's widely-used WebRTC VAD.

In just a few lines of code, you can start detecting voice activity in real time from a microphone using the Cobra Voice Activity Detection Node.js SDK. Let’s get started!

Install Packages

Create a project and install @picovoice/pvrecorder-node and @picovoice/cobra-node.

  • @picovoice/pvrecorder-node will be used to record microphone audio
  • @picovoice/cobra-node will perform the voice activity detection

Sign up for Picovoice Console

Next, create a Picovoice Console account, and copy your AccessKey from the main dashboard. Creating an account is free, and no credit card is required!

Initialize Cobra

Create an instance of Cobra with your AccessKey:

Capture Microphone Audio

Next, we need to pass audio to Cobra to perform voice activity detection. This audio can be from a microphone or a stream you receive from another source, as long as the audio frames are of a specific frame length (specified by cobra.frameLength) and the audio itself is recorded at the required sample rate (specified by cobra.sampleRate).

In digital audio, an audio frame refers to a discrete unit of audio data that represents a brief moment in time. An audio frame consists of a number of samples, each of which is a numeric value that represents the amplitude of the sound waveform at a single point in time. The number of samples in each audio frame is referred to as its frame length.

To record audio with the appropriate frame length, we can use PvRecorder - an audio recorder designed for real-time speech audio processing.

Create an instance of PvRecorder with cobra.frameLength, and call pvRecorder.start().

To stop recording audio, call pvRecorder.stop()

Detect Voice Activity

Each call to pvRecorder.read() will return a single audioFrame that you can then pass to cobra for processing. Once processed, cobra will return a voice probability score, which is a floating-point value between 0 and 1.

Putting It All Together

A complete working example can look something like this:

For a complete working project, take a look at the Cobra Voice Activity Detection Node.js Demo. You can also view the Cobra Voice Activity Detection Node.js API docs for details on the package.

Have you seen our other Node.js tutorials? Don’t forget to check out Real-time Transcription with Node.js, Batch Transcription with Node.js, and Speaker Recognition with Node.js.