Picovoice Wordmark
Start Building
Introduction
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryPicovoice picoLLMGPTQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeWeb
SummaryPicovoice LeopardAmazon TranscribeAzure Speech-to-TextGoogle ASRGoogle ASR (Enhanced)IBM Watson Speech-to-TextWhisper Speech-to-Text
FAQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeWeb
SummaryPicovoice Cheetah
FAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryAmazon PollyAzure TTSElevenLabsOpenAI TTSPicovoice Orca
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice KoalaMozilla RNNoise
Introduction
AndroidCiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidCNode.jsPythoniOSWeb
SummaryPicovoice EaglepyannoteSpeechBrainWeSpeaker
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice FalconAmazon TranscribeAzure Speech-to-TextGoogle Speech-to-Textpyannote
Introduction
AndroidArduinoCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSMicrocontrollerNode.jsPythonRaspberry PiReactReact NativeSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaMicrocontrollerNode.jsPythonReactReact NativeUnityWeb
SummaryPorcupineSnowboyPocketSphinx
Wake Word TipsFAQ
Introduction
AndroidCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeUnityWeb
SummaryPicovoice RhinoGoogle DialogflowAmazon LexIBM WatsonMicrosoft LUIS
Expression SyntaxFAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryPicovoice CobraWebRTC VADSilero VAD
FAQ
Introduction
AndroidC.NETFlutteriOSNode.jsPythonReact NativeUnityWeb
AndroidC.NETFlutteriOSNode.jsPythonReact NativeUnityWeb
Introduction
C.NETNode.jsPython
C.NETNode.jsPython
FAQGlossary

Voice Activity Detection Benchmark

Voice activity detection (VAD) is the recognition of human speech within a stream of audio. Voice activity detection is one of the main building blocks of speech-enabled applications. VAD accuracy has a compounding effect on the system performance as many downstream speech processing blocks depend on it. VoIP, IVR, telemarketing, and security systems incorporate voice activity detection.

Cobra Voice Activity Detection is an enterprise-grade, extremely efficient, and real-time VAD that achieves best-in-class accuracy across platforms. and real-time VAD that achieves best-in-class accuracy. Below is a series of benchmarks to validate the accuracy claims.

Methodology

Engines

We compare the accuracy of Cobra with the voice-activity detector used for the WebRTC project, developed by Google and widely regarded as one of the best VADs, and with Silero VAD, another popular open-source VAD released more recently.

Speech Corpus

We use LibriSpeech (test-clean portion) as the speech corpus. It provides a diverse number of speakers and is gender-balanced.

Noise

The real challenge in building a performant VAD is resilience to noise. To test out the effect of noise, we mix noise with speech data before feeding it to VAD engines. For this purpose, we use the DEMAND dataset that contains noise recordings in diverse environments.

Metric

We use the receiver operator characteristics curve. The ROC curve is a known tool for inspecting the performance of binary classifiers across different decision thresholds. It allows the designer to study the interplay of detection rate vs false positive rate.

To measure performance, we use the real-time factor. This factor measures how much computational time a model needs to process files relative to the duration of the file. These scores were recorded on an Ubuntu 22.04 machine with AMD CPU (AMD Ryzen 9 5900X (12) @ 3.70GHz).

Results

VAD accuracy comparisonVAD accuracy comparison
VAD runtime comparisonVAD runtime comparison

Usage

The data and code used to create this benchmark are available on GitHub under the permissive Apache 2.0 license. Detailed instructions for benchmarking individual engines are in the following documents:

  • Picovoice Cobra
  • WebRTC VAD
  • Silero VAD

Was this doc helpful?

Issue with this doc?

Report a GitHub Issue
Voice Activity Detection Benchmark
  • Methodology
  • Engines
  • Speech Corpus
  • Noise
  • Metric
  • Results
  • Usage
Voice AI
  • picoLLM On-Device LLM
  • Leopard Speech-to-Text
  • Cheetah Streaming Speech-to-Text
  • Orca Text-to-Speech
  • Koala Noise Suppression
  • Eagle Speaker Recognition
  • Falcon Speaker Diarization
  • Porcupine Wake Word
  • Rhino Speech-to-Intent
  • Cobra Voice Activity Detection
Resources
  • Docs
  • Console
  • Blog
  • Use Cases
  • Playground
Sales & Services
  • Consulting
  • Foundation Plan
  • Enterprise Plan
  • Enterprise Support
Company
  • About us
  • Careers
Follow Picovoice
  • LinkedIn
  • GitHub
  • X
  • YouTube
  • AngelList
Subscribe to our newsletter
Terms of Use
Privacy Policy
© 2019-2025 Picovoice Inc.