Picovoice Wordmark
Start Building
Introduction
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryPicovoice picoLLMGPTQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustWeb
SummaryPicovoice LeopardAmazon TranscribeAzure Speech-to-TextGoogle ASRGoogle ASR (Enhanced)IBM Watson Speech-to-TextWhisper Speech-to-Text
FAQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustWeb
SummaryPicovoice Cheetah
FAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryAmazon PollyAzure TTSElevenLabsOpenAI TTSPicovoice Orca
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice KoalaMozilla RNNoise
Introduction
AndroidCiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidCNode.jsPythoniOSWeb
SummaryPicovoice EaglepyannoteSpeechBrainWeSpeaker
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice FalconAmazon TranscribeAzure Speech-to-TextGoogle Speech-to-Textpyannote
Introduction
AndroidArduinoCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSMicrocontrollerNode.jsPythonRaspberry PiReactReact NativeRustSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaMicrocontrollerNode.jsPythonReactReact NativeRustUnityWeb
SummaryPorcupineSnowboyPocketSphinx
Wake Word TipsFAQ
Introduction
AndroidCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustUnityWeb
SummaryPicovoice RhinoGoogle DialogflowAmazon LexIBM WatsonMicrosoft LUIS
Expression SyntaxFAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiRustWebWindows
AndroidC.NETiOSNode.jsPythonRustWeb
SummaryPicovoice CobraWebRTC VAD
FAQ
Introduction
AndroidC.NETFlutteriOSNode.jsPythonReact NativeRustUnityWeb
AndroidC.NETFlutteriOSNode.jsPythonReact NativeRustUnityWeb
Introduction
C.NETNode.jsPython
C.NETNode.jsPython
FAQGlossary

Speaker Recognition Benchmark

Speaker recognition is the timely identification of a person in an audio stream based on their voiceprint. It determines whether a specific individual is speaking at a given time. The benchmark assesses Picovoice Eagle against well-known open-source speaker recognition engines listed below:

  • pyannote
  • SpeechBrain
  • WeSpeaker

Methodology

For this benchmark, it is assumed that the enrollment step takes place offline. Subsequently, the speaker recognition engine is used to detect the enrolled speaker within a stream of audio frames. The duration of each audio frame is 96 ms.

Speech Corpus

VoxConverse is a well-known dataset used in speaker identification. It contains conversations in many languages and includes time details for speakers.

Metrics

Detection Accuracy

The Detection Accuracy (DA) metric is determined by the accuracy of the recognition system as a binary classification, and its computation relies on the formula:

where indicates the duration of true positives (segments correctly classified as the enrolled speaker), represents the duration of true negatives (segments accurately identified as non-enrolled speakers), and is the overall duration of the input audio signal.

Detection error rate

The Detection Error Rate (DER) metric assesses the duration of errors relative to the total duration of enrolled speaker segments:

where and denote the duration of false alarms and missed detections for enrolled speakers, and is the overall duration of enrolled speaker segments in the input audio signal.

Core-Hour

The Core-Hour metric is used to evaluate the computational efficiency of the speaker recognition engine, indicating the number of hours required to process one hour of audio on a single CPU core.

All measurements are carried out on a machine with AMD CPU (`AMD Ryzen 7 5700X (16) @ 3.400G`), 64 GB of RAM, and NVMe storage.

Results

Accuracy

The figures below show the average performance of each engine by calculating the average Detection Accuracy and Detection Error Rate.

Speaker Recognition Detection Accuracy ComparisonSpeaker Recognition Detection Accuracy Comparison
Speaker Recognition Detection Error Rate ComparisonSpeaker Recognition Detection Error Rate Comparison

Resource Utilization

Core-Hour ComparisonCore-Hour Comparison

Usage

The data and code used to create this benchmark are available on GitHub under the permissive Apache 2.0 license. Detailed instructions for benchmarking individual engines are provided in the following documents:

  • pyannote
  • WeSpeaker
  • SpeechBrain
  • Picovoice Eagle

Was this doc helpful?

Issue with this doc?

Report a GitHub Issue
Speaker Recognition Benchmark
  • Methodology
  • Speech Corpus
  • Metrics
  • Results
  • Accuracy
  • Resource Utilization
  • Usage
Voice AI
  • Leopard Speech-to-Text
  • Cheetah Streaming Speech-to-Text
  • Orca Text-to-Speech
  • Koala Noise Suppression
  • Eagle Speaker Recognition
  • Falcon Speaker Diarization
  • Porcupine Wake Word
  • Rhino Speech-to-Intent
  • Cobra Voice Activity Detection
Local LLM
  • picoLLM Inference
  • picoLLM Compression
  • picoLLM GYM
Resources
  • Docs
  • Console
  • Blog
  • Use Cases
  • Playground
Sales & Services
  • Consulting
  • Foundation Plan
  • Enterprise Plan
  • Enterprise Support
Company
  • About us
  • Careers
Follow Picovoice
  • LinkedIn
  • GitHub
  • X
  • YouTube
  • AngelList
Subscribe to our newsletter
Terms of Use
Privacy Policy
© 2019-2025 Picovoice Inc.