TLDR: 100x more efficient, 5x more accurate
Falcon Speaker Diarization
is 100x more efficient than pyannote Speaker Diarization
and diarizes speakers 5x more accurately than Google Speech-to-Text
. Falcon is a transcription-engine-agnostic and language-independent Speaker Diarization
with no cap on the number of speakers.
What’s Speaker Diarization?
Speaker Diarization
identifies “who spoke when” by dividing audio into segments based on voice characteristics and associating the segments with speakers. The industry practice is to embed Speaker Diarization
into commercial speech-to-text systems. However, some speech-to-text solutions, such as OpenAI’s Whisper
, don’t offer embedded Speaker Diarization
.
Why Falcon Speaker Diarization?
Falcon
is a highly accurate, efficient, and modular Speaker Diarization
engine powered by deep learning. What makes Falcon unique is:
No transcription engine dependency.
Speaker Diarization
embedded in speech-to-text APIs only works with the transcription engines that they are in.
Falcon Speaker Diarization
can work with any transcription engine, whether OpenAI Whisper
, Google Speech-to-Text
, or Amazon Transcribe
.
No cap on the number of speakers or need to input the number of speakers.
Most speech-to-text APIs can only perform Speaker Diarization
on a limited number of speakers. Some of them require developers to input the number of speakers in advance. It’s not just time-consuming, but almost impossible when transcribing massive audio archives.
Falcon Speaker Diarization
identifies and diarizes an uncapped number of speakers without input.
No need to limit conversations to one language.
Speaker Diarization
embedded in speech-to-text APIs performs jointly with the speech-to-text, meaning it’s limited to the languages supported by speech-to-text APIs.
Falcon Speaker Diarization
can recognize and diarize speakers even in multilingual settings.
Like all Picovoice engines, Falcon Speaker Diarization
- processes voice data offline,
- runs across platforms,
- is ready to be deployed even with the Forever-Free Plan.
f = pvfalcon.create(access_key)segments = f.process_file(path)
How to Measure Speaker Diarization Performance?
We have benchmarked the accuracy and computational performance of Falcon Speaker Diarization
against Speaker Diarization
capabilities of Amazon Transcribe
, Azure Speech-to-Text
, Google Speech-to-Text
, and pyannote.audio
using Diarization Error Rate, Jaccard Error Rate, and Core-Hour.
Check out the Open-source Speaker Diarization Benchmark for details.
Diarization Error Rate (DER)
The Diarization Error Rate (DER) is a traditional and most commonly used metric to evaluate the performance of speaker diarization systems. The higher the DER, the lower the accuracy.
Falcon Speaker Diarization
performs better than the Speaker Diarization
embedded in Big Tech speech-to-text APIs, including Amazon Transcribe
, Azure Speech-to-Text
, Google Speech-to-Text
, and Google Speech-to-Text Enhanced
.
Jaccard Error Rate
Evaluation metrics also evolve as AI advances. The Jaccard Error Rate (JER) is a recently developed metric for the DIHARD Diarization Challenge. JER assigns equal weight to each speaker's contribution, regardless of their speech duration. Similar to DER, the lower the JER, the better. Falcon Speaker Diarization
outperforms other alternatives, including the Speaker Diarization
capabilities of Big Tech speech-to-text engines.
Core-Hour
The Core-Hour is a metric used to evaluate the computational efficiency, indicating the hours required to process audio on a single CPU core. The lower the Core-Hour, the more efficient the model.
Falcon Speaker Diarization
is 100 times more efficient than pyannote Speaker Diarization
. In other words, Falcon Speaker Diarization
can process 100 hours of audio using the same resources pyannote Speaker Diarization
uses for only 1 hour.
We only compared the computational requirements of pyannote Speaker Diarization
and Picovoice Falcon Speaker Diarization
as it does not apply to cloud-based APIs.
What’s next?
Your feedback is an essential part of the process. Please create a GitHub issue to report bugs. If you enjoy building with Falcon Speaker Diarization, give it a star to help fellow developers quickly find it.
Start Building