Falcon: Speaker Diarization for Developers

🎯 Voice AI Consulting

Get dedicated support and consultation to ensure your specific needs are met.

TLDR: 100x more efficient, 5x more accurate

Falcon Speaker Diarization is 100x more efficient than pyannote Speaker Diarization and diarizes speakers 5x more accurately than Google Speech-to-Text. Falcon is a transcription-engine-agnostic and language-independent Speaker Diarization with no cap on the number of speakers.

What’s Speaker Diarization?

Speaker Diarization identifies “who spoke when” by dividing audio into segments based on voice characteristics and associating the segments with speakers. The industry practice is to embed Speaker Diarization into commercial speech-to-text systems. However, some speech-to-text solutions, such as OpenAI’s Whisper, don’t offer embedded Speaker Diarization.

Why Falcon Speaker Diarization?

Falcon is a highly accurate, efficient, and modular Speaker Diarization engine powered by deep learning. What makes Falcon unique is:

No transcription engine dependency.

Speaker Diarization embedded in speech-to-text APIs only works with the transcription engines that they are in.

Falcon Speaker Diarization can work with any transcription engine, whether OpenAI Whisper, Google Speech-to-Text, or Amazon Transcribe.

No cap on the number of speakers or need to input the number of speakers.

Most speech-to-text APIs can only perform Speaker Diarization on a limited number of speakers. Some of them require developers to input the number of speakers in advance. It’s not just time-consuming, but almost impossible when transcribing massive audio archives.

Falcon Speaker Diarization identifies and diarizes an uncapped number of speakers without input.

No need to limit conversations to one language.

Speaker Diarization embedded in speech-to-text APIs performs jointly with the speech-to-text, meaning it’s limited to the languages supported by speech-to-text APIs.

Falcon Speaker Diarization can recognize and diarize speakers even in multilingual settings.

Like all Picovoice engines, Falcon Speaker Diarization

processes voice data offline,
runs across platforms,
is ready to be deployed even with the Free Plan.

f = pvfalcon.create(access_key)

segments = f.process_file(path)
Build with Python

How to Measure Speaker Diarization Performance?

We have benchmarked the accuracy and computational performance of Falcon Speaker Diarization against Speaker Diarization capabilities of Amazon Transcribe, Azure Speech-to-Text, Google Speech-to-Text, and pyannote.audio using Diarization Error Rate, Jaccard Error Rate, and Core-Hour.

Check out the Open-source Speaker Diarization Benchmark for details.

Diarization Error Rate (DER)

The Diarization Error Rate (DER) is a traditional and most commonly used metric to evaluate the performance of speaker diarization systems. The higher the DER, the lower the accuracy.

Falcon Speaker Diarization performs better than the Speaker Diarization embedded in Big Tech speech-to-text APIs, including Amazon Transcribe, Azure Speech-to-Text, Google Speech-to-Text, and Google Speech-to-Text Enhanced.

Jaccard Error Rate

Evaluation metrics also evolve as AI advances. The Jaccard Error Rate (JER) is a recently developed metric for the DIHARD Diarization Challenge. JER assigns equal weight to each speaker's contribution, regardless of their speech duration. Similar to DER, the lower the JER, the better. Falcon Speaker Diarization outperforms other alternatives, including the Speaker Diarization capabilities of Big Tech speech-to-text engines.

Core-Hour

The Core-Hour is a metric used to evaluate the computational efficiency, indicating the hours required to process audio on a single CPU core. The lower the Core-Hour, the more efficient the model.

Falcon Speaker Diarization is 100 times more efficient than pyannote Speaker Diarization. In other words, Falcon Speaker Diarization can process 100 hours of audio using the same resources pyannote Speaker Diarization uses for only 1 hour.

We only compared the computational requirements of pyannote Speaker Diarization and Picovoice Falcon Speaker Diarization as it does not apply to cloud-based APIs.

What’s next?

Your feedback is an essential part of the process. Please create a GitHub issue to report bugs. If you enjoy building with Falcon Speaker Diarization, give it a star to help fellow developers quickly find it.

Start Building