Speaker Recognition Benchmark
Speaker recognition is the timely identification of a person in an audio stream based on their voiceprint. It determines whether a specific individual is speaking at a given time. The benchmark assesses Picovoice Eagle against well-known open-source speaker recognition engines listed below:
Methodology
For this benchmark, it is assumed that during the enrollment step access to the entire enrollment audio is available. Then, the enrolled speaker is detected within a stream of audio using the speaker recognition engine.
Metrics
Equal Error Rate
The Equal Error Rate (EER) metric is determined by the accuracy of the recognition system as a binary classification, and its computation relies on the formula:
The equal error rate (EER) is when the false acceptance rate (FAR) and false rejection rate (FRR) are equal. When these rates are equal, the common value is termed as equal error rate, given by:
where and are equal.
Model Size
The size of the model on init is used to evaluate the memory consumption of the speaker recognition engine, indicating the minimum amount of ram required to use the engine.
Results
Accuracy
The figure below show the average performance of each engine by calculating the average Equal Error Rate.
Resource Utilization
Usage
The data and code used to create this benchmark are available on GitHub under the permissive Apache 2.0 license. Detailed instructions for benchmarking individual engines are provided in the following documents: