Speaker Diarization technology is a process that automatically segments and labels an audio recording based on different speakers' voices. It is often used in applications that involve transcription and analysis, in settings such as call centers, meetings, and broadcast media.

Picovoice's Falcon Speaker Diarization provides a fast and easy method for performing diarization on device.

The Falcon Speaker Diarization engine is available for Android versions 5.0 (SDK 21) and later.

Falcon Speaker Diarization Android SDK

To integrate the Falcon Speaker Diarization Android SDK into your Android project, ensure you have included mavenCentral() in your top-level build.gradle file, then add the following dependency to your app’s build.gradle file:

This example uses AndroidVoiceProcessor to record audio.

Sign up for Picovoice Console

Sign up for Picovoice Console for free and copy your AccessKey. It handles authentication and authorization.

Usage

Permissions

To enable AccessKey validation and recording with your Android device's microphone, add the following to the app's AndroidManifest.xml file:

Initialization

Create an instance of the engine with the Falcon Builder class by passing in the AccessKey from the previous step and the Android app context, and get the singleton instance of VoiceProcessor:

Recording Audio Frames

Falcon Speaker Diarization processes audio in chunks, also known as audio frames. The .frameLength property gives the number of audio samples per frame that are required by Falcon, while the .sampleRate property gives the audio sample rate that is required. Audio samples must be 16-bit integers.

Use VoiceProcessor.addFrameListener to add a listener to VoiceProcessor that receives audio frames and passes them along to Falcon for processing:

Start processing audio using VoiceProcessor.start by passing in the desired frame length and Falcon's audio sample rate as arguments:

This will start VoiceProcessor and the audio frames are passed to the listeners as mentioned above.

To stop processing audio, call VoiceProcessor.stop:

Processing Audio Frames

Falcon's .process() method takes in a short[], so simply convert the ArrayList to the required format and pass it to falcon for processing:

The returned segments variable represents an array of segments, each of which includes the segment's timing and speaker information.

Clean up

Call VoiceProcessor.clearFrameListeners and Falcon.delete to clear any allocated resources:

Working Example

For a complete working project, take a look at the Falcon Speaker Diarization Android Demo.

For more information, check out the Falcon Speaker Diarization product page or refer to the Falcon Speaker Diarization Android SDK quick start guide.

Start Building