Speaker Diarization Benchmark
Speaker diarization involves labeling audio with speaker identities, often used alongside a speech-to-text (STT) engine to transcribe audio while assigning speaker labels. This benchmark evaluates the performance of Picovoice Falcon in comparison to well-known cloud-based STT engines and specialized speaker diarization engines listed below:
Methodology
Speech Corpus
VoxConverse is a widely recognized dataset used for diarization purposes, containing conversations among speakers in multiple languages. In this benchmark, we employ cloud-based Speech-to-Text engines that come with speaker diarization capabilities. Therefore, for benchmarking, we specifically use the English subset found within the test section of the dataset.
Metrics
Diarization Error Rate (DER)
The Diarization Error Rate (DER) is the most common metric for evaluating speaker diarization systems. DER is calculated by summing the time duration of three distinct errors: speaker confusion, false alarms, and missed detections. This total duration is then divided by the overall time span.
Jaccard Error Rate (JER)
The Jaccard Error Rate (JER) is a newly developed metric for evaluating speaker diarization, specifically designed for DIHARD II. It is based on the Jaccard similarity index, which measures the similarity between two sets of segments. In short, JER assigns equal weight to each speaker's contribution, regardless of their speech duration. For a more in-depth understanding, refer to the second DIHARD's paper.
Total Memory Usage
This metric provides insight into the memory consumption of the diarization engine during its processing of audio files. It presents the total memory utilized, measured in gigabytes (GB).
Core-Hour
The Core-Hour metric is used to evaluate the computational efficiency of the diarization engine. This metric indicates how much audio can be processed in an hour using a single CPU core.
Results
Accuracy
The figures below show the average performance of each engine by calculating the average DER and JER.
Resource Utilization
Usage
The data and code used to create this benchmark are available on GitHub under the permissive Apache 2.0 license. Detailed instructions for benchmarking individual engines are provided in the following documents: