Speaker Diarization for Web using JavaScript

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI apps running within web browsers without sending user data to 3rd party servers.

Speaker Diarization is a process used in audio processing to partition a given audio stream into segments based on who is speaking, essentially identifying "who spoke when." This technology is commonly employed in tasks like transcribing multi-speaker conversations, call center analytics, and audio indexing.

Picovoice's Falcon Speaker Diarization engine is a highly accurate and efficient Speaker Diarization engine powered by deep learning. It is transcription-engine-agnostic, language-independent, and is capable of performing speaker diarization on an uncapped number of speakers.

In just a few lines of code, you can start performing speaker diarization using the Falcon Speaker Diarization Web SDK. Let’s get started!

Install Falcon Speaker Diarization Web SDK

Install the Falcon Speaker Diarization Web SDK using npm:

npm install @picovoice/falcon-web

Next, create a Picovoice Console account, and copy your AccessKey from the main dashboard. Creating an account is free, and no credit card is required!

Usage

Falcon Speaker Diarization Model

Add the Falcon Speaker Diarization model to the project by:

Either copying the model file to the project's public directory:

cp ${FALCON_PARAMS_PATH} ${PUBLIC_DIRECTORY}/${FALCON_PARAMS}

(or)

Create a base64 string of the model using the pvbase64 script included in the package:

npx pvbase64 -i ${FALCON_PARAMS_PATH} -o ${OUTPUT_DIRECTORY}/${MODEL_NAME}.js

Create an object containing the Falcon model options:

import base64model from '${OUTPUT_DIRECTORY}/${MODEL_NAME}.js'

const falconModel = {
  publicPath: '${PUBLIC_DIRECTORY}/${FALCON_PARAMS}',
  // or
  base64: base64model,
}

Initialization

Initialize Falcon with the falconModel variable containing the model options:

import { FalconWorker } from "@picovoice/falcon-web";

const falcon = await FalconWorker.create(
  "${ACCESS_KEY}",
  falconModel,
);

FalconWorker uses web workers to process audio data. Web workers might not be supported (i.e. Firefox private mode). In this case, use Falcon instead, which uses the main thread to process audio data.

Diarization

Implement getAudioData based on your application. It can read from a microphone via Web Audio API or possibly from a file. The audio returned needs to be 16-bit linearly-encoded. The required sample rate can be retrieved from .sampleRate. Furthermore, the engine operates on single-channel audio.

function getAudioData(): Int16Array {
  // get audio
  return new Int16Array();
}

const { segments } = await falcon.process(getAudioData());
console.log(segments);
/*
  [{
    speakerTag: 1,
    startSec: 2.7,
    endSec: 8.2
  }, ...]
*/

Upon completion, falcon.process() will return an array of segment objects, each with metadata including a speakerTag used to identify unique speakers, as well as the start and end time for each segment.

Clean up

Clean up allocated resources:

falcon.terminate()

If Falcon was used instead of FalconWorker, clean resources with await falcon.release().

For a complete working project, take a look at the Falcon Speaker Diarization Web Demo. You can also view the Falcon Speaker Diarization Web API docs for details on the package.

Speaker Diarization for Web Applications using JavaScript

Install Falcon Speaker Diarization Web SDK

Usage

Falcon Speaker Diarization Model

Initialization

Diarization

Clean up

YouTube Tutorial

More from Picovoice

Speaker Diarization for Web Applications using JavaScript

Install Falcon Speaker Diarization Web SDK

Sign up for Picovoice Console

Usage

Falcon Speaker Diarization Model

Initialization

Diarization

Clean up

YouTube Tutorial

More from Picovoice