Speaker Diarization is a process used in audio processing to partition a given audio stream into segments based on who is speaking, essentially identifying "who spoke when." This technology is commonly employed in tasks like transcribing multi-speaker conversations, call center analytics, and audio indexing.

Picovoice's Falcon Speaker Diarization engine is a highly accurate and efficient Speaker Diarization engine powered by deep learning. It is transcription-engine-agnostic, language-independent, and is capable of performing speaker diarization on an uncapped number of speakers.

In just a few lines of code, you can start performing speaker diarization using the Falcon Speaker Diarization Web SDK. Let’s get started!

Install Falcon Speaker Diarization Web SDK

Install the Falcon Speaker Diarization Web SDK using npm:

Sign up for Picovoice Console

Next, create a Picovoice Console account, and copy your AccessKey from the main dashboard. Creating an account is free, and no credit card is required!

Usage

Falcon Speaker Diarization Model

Add the Falcon Speaker Diarization model to the project by:

  • Either copying the model file to the project's public directory:

(or)

  • Create a base64 string of the model using the pvbase64 script included in the package:

Create an object containing the Falcon model options:

Initialization

Initialize Falcon with the falconModel variable containing the model options:

FalconWorker uses web workers to process audio data. Web workers might not be supported (i.e. Firefox private mode). In this case, use Falcon instead, which uses the main thread to process audio data.

Diarization

Implement getAudioData based on your application. It can read from a microphone via Web Audio API or possibly from a file. The audio returned needs to be 16-bit linearly-encoded. The required sample rate can be retrieved from .sampleRate. Furthermore, the engine operates on single-channel audio.

Upon completion, falcon.process() will return an array of segment objects, each with metadata including a speakerTag used to identify unique speakers, as well as the start and end time for each segment.

Clean up

Clean up allocated resources:

If Falcon was used instead of FalconWorker, clean resources with await falcon.release().

For a complete working project, take a look at the Falcon Speaker Diarization Web Demo. You can also view the Falcon Speaker Diarization Web API docs for details on the package.