Speaker Recognition Benchmark

Speaker recognition is the timely identification of a person in an audio stream based on their voiceprint. It determines whether a specific individual is speaking at a given time. The benchmark assesses Picovoice Eagle against well-known open-source speaker recognition engines listed below:

Methodology

For this benchmark, it is assumed that the enrollment step takes place offline. Subsequently, the speaker recognition engine is used to detect the enrolled speaker within a stream of audio frames. The duration of each audio frame is 96 ms.

Speech Corpus

VoxConverse is a well-known dataset used in speaker identification. It contains conversations in many languages and includes time details for speakers.

Metrics

Detection Accuracy

The Detection Accuracy (DA) metric is determined by the accuracy of the recognition system as a binary classification, and its computation relies on the formula:

where indicates the duration of true positives (segments correctly classified as the enrolled speaker), represents the duration of true negatives (segments accurately identified as non-enrolled speakers), and is the overall duration of the input audio signal.

Detection error rate

The Detection Error Rate (DER) metric assesses the duration of errors relative to the total duration of enrolled speaker segments:

where and denote the duration of false alarms and missed detections for enrolled speakers, and is the overall duration of enrolled speaker segments in the input audio signal.

Core-Hour

The Core-Hour metric is used to evaluate the computational efficiency of the speaker recognition engine, indicating the number of hours required to process one hour of audio on a single CPU core.

All measurements are carried out on a machine with AMD CPU (`AMD Ryzen 7 5700X (16) @ 3.400G`), 64 GB of RAM, and NVMe storage.

Results

Accuracy

The figures below show the average performance of each engine by calculating the average Detection Accuracy and Detection Error Rate.

Speaker Recognition Detection Error Rate Comparison

Resource Utilization

Usage

The data and code used to create this benchmark are available on GitHub under the permissive Apache 2.0 license. Detailed instructions for benchmarking individual engines are provided in the following documents:

Was this doc helpful?

Issue with this doc?