Speaker Diarization
is a process used in audio processing to partition a given audio stream into segments based on who is speaking, essentially identifying "who spoke when." This technology is commonly employed in tasks like transcribing multi-speaker conversations, call center analytics, and audio indexing.
Picovoice's Falcon Speaker Diarization engine is a highly accurate and efficient Speaker Diarization
engine powered by deep learning. It is transcription-engine-agnostic, language-independent, and is capable of performing speaker diarization on an uncapped number of speakers.
In just a few lines of code, you can start performing speaker diarization using the Falcon Speaker Diarization Web SDK. Let’s get started!
Install Falcon Speaker Diarization Web SDK
Install the Falcon Speaker Diarization Web SDK using npm
:
Sign up for Picovoice Console
Next, create a Picovoice Console account, and copy your AccessKey
from the main dashboard. Creating an account is free, and no credit card is required!
Usage
Falcon Speaker Diarization Model
Add the Falcon Speaker Diarization model to the project by:
- Either copying the model file to the project's public directory:
(or)
- Create a base64 string of the model using the
pvbase64
script included in the package:
Create an object containing the Falcon model options:
Initialization
Initialize Falcon with the falconModel
variable containing the model options:
FalconWorker
uses web workers to process audio data. Web workers might not be supported (i.e. Firefox private mode). In this case,
use Falcon
instead, which uses the main thread to process audio data.
Diarization
Implement getAudioData
based on your application. It can read from a microphone via Web Audio API
or possibly from a file. The audio returned needs to be 16-bit linearly-encoded. The required sample rate can be retrieved from .sampleRate
. Furthermore, the engine operates on single-channel audio.
Upon completion, falcon.process()
will return an array of segment objects, each with metadata including a speakerTag
used to identify unique speakers, as well as the start and end time for each segment.
Clean up
Clean up allocated resources:
If Falcon
was used instead of FalconWorker
, clean resources with await falcon.release()
.
For a complete working project, take a look at the Falcon Speaker Diarization Web Demo. You can also view the Falcon Speaker Diarization Web API docs for details on the package.