Bluebird Streaming Speaker DiarizationBETA

Real-time speaker diarization for live events and meetings

Add accurate speaker labels to Parakeet and other streaming ASRs with <250 ms latency, short-segment precision, and unlimited speaker support. On-device. Lightweight. Enterprise-ready.

bluebird_demo.py — Live Transcript
Streaming
<250 ms
Speaker label latency
No cap
Speaker support
Any ASR
Pairs with any streaming transcription engine in any language
What is Bluebird Streaming Speaker Diarization?

Only production-ready on-device streaming speaker diarization SDK

Bluebird is an enterprise-ready on-device streaming speaker diarization engine built for real-time applications deployed at scale. It identifies who is speaking in live audio streams, processes audio data entirely offline across platforms, and is private by architecture.

All major cloud streaming diarization APIs require audio to leave the device on every inference. Beyond the cloud dependency itself, each comes with meaningful constraints: Amazon Transcribe's streaming diarization supports English only and degrades above five speakers. Azure Speech returns "Unknown" speaker IDs in early results and struggles in practice beyond two or three speakers. Google Chirp 3 is restricted to the US and EU regions. For live events, field deployments, and privacy-sensitive applications, none of these is a viable architecture. Unlike cloud alternatives, Bluebird is built for the conditions of real conversations — multiple speakers, brief turns, and audio that looks nothing like a studio recording.

Bluebird Streaming Diarization processes audio entirely on-device, producing speaker labels in under 250 ms. No speaker cap, no language restriction, no network latency, no audio data ever leaving the platform it runs on. It works with any streaming ASR and routes output by speaker — in real time, in the stream.

Capabilities

Why enterprises choose Bluebird Streaming Diarization

Bluebird is an enterprise-ready on-device streaming speaker diarization engine built for real-time applications deployed at scale. It identifies speakers in live audio streams in milliseconds with no speaker cap, language or geo restrictions. Bluebird Streaming Diarization runs entirely offline across platforms, and is private by architecture.

01On-deviceBluebird Streaming Diarization is the only production-ready on-device streaming speaker diarization SDK. Every cloud alternative — AssemblyAI, Speechmatics, Deepgram, Amazon Transcribe, Azure Speech, Google Cloud STT — requires audio to leave the device on every inference. Bluebird Streaming Diarization processes audio entirely on-device. No audio transmitted to any server. No network round-trip.
02ASR-AgnosticBluebird Streaming Diarization works with any streaming transcription engine, including Parakeet, Amazon Transcribe Streaming, Google Cloud Speech-to-Text and more. Cloud diarization APIs bundle speaker identification into their own ASR pipeline, forcing developers to use their STT or nothing. Bluebird is decoupled from transcription entirely. It can work alongside any streaming ASR, producing speaker labels separately, giving developers the flexibility to combine speaker labels with the transcript in the pipeline.
03<250 ms LatencyBluebird Streaming Diarization is optimized for continuous speaker label detection, making speaker labels available in under 250 ms of a new speaker beginning to talk. Cloud alternatives like Azure and Google return "Unknown" speaker IDs in early intermediate results while they accumulate audio to build voice profiles. Bluebird identifies speakers from the first utterance — making it viable for live captioning, real-time meeting notes, and field applications where early accuracy matters.
04Short-segment precisionBluebird Streaming Diarization is specifically optimised for short turns, one-word responses, and rapid exchanges that make up real conversations. Cloud APIs require several seconds of audio per speaker before they can assign labels reliably; their accuracy drops significantly for segments under one second. Short-segment precision makes Bluebird Streaming Diarization viable — and often the only viable choice — for contact center, legal, and medical use cases where short-utterance accuracy determines the usefulness of the transcript.
05Unlimited speaker supportBluebird Streaming Diarization has no fixed speaker limit, scaling to panel discussions, conference sessions, and any multi-party audio where the number of speakers is unpredictable or large. Cloud Streaming STT APIs treat diarization as a small feature, coming with significant limitations, including speaker count. Amazon Transcribe's streaming diarization accuracy decreases after five speakers. Google Cloud STT requires the speaker count to be set in advance. Azure caps the number of speakers at 35.
06Cross-PlatformBluebird Streaming Diarization runs on every platform your product ships — Android, Chrome, Edge, Firefox, iOS, Linux, macOS, Raspberry Pi, Safari, and Windows — across AMD, Intel, NVIDIA, and Qualcomm hardware.
07Private by architectureBluebird Streaming Diarization processes audio entirely on-device. No audio data is transmitted to any server. GDPR, HIPAA, CCPA, and CJIS compliant by architecture — not policy. Picovoice cannot access end-user audio.
08Enterprise ReadyBluebird Streaming Diarization is production-grade and enterprise-ready. Picovoice offers flexible licensing, dedicated engineering support, NDA-protected custom model training, and SLA-backed response times for teams shipping at scale.
bluebird_demo.py — Live Transcript
Streaming

Ship it.
On device.

Real-time speaker diarization on-device and in the stream.

FAQ

Common questions about streaming speaker diarization

+
What is streaming speaker diarization?

Streaming speaker diarization is the task of automatically identifying who is speaking at any moment in a live audio stream. Streaming speaker diarization assigns speaker labels like SPEAKER_1 and SPEAKER_2 in real time as a conversation unfolds, without waiting for the recording to end. Unlike batch diarization, which processes a complete audio file and can revise speaker assignments across the entire recording, streaming diarization must commit to labels immediately with only partial context.

Bluebird Streaming Speaker Diarization processes audio on-device frame by frame and returns a speaker index alongside each transcript segment, making it suitable for live captioning, real-time meeting notes, contact center agent assist, and field recording applications.

+
How does Bluebird Streaming Speaker Diarization differ from batch speaker diarization?

Batch diarization — like Falcon Speaker Diarization — processes a complete audio file and assigns speaker labels after all audio has been captured, allowing the engine to revise and optimise labels across the entire recording.

Bluebird Streaming Speaker Diarization processes audio in a live stream and assigns speaker labels in real time as each utterance is spoken. Use Bluebird Streaming Speaker Diarization when speaker identity is needed during the conversation — live captioning, real-time routing, contact center agent assist, or any application that must act on who is speaking before the conversation ends. Use Falcon Speaker Diarization when post-conversation accuracy on a completed recording is the priority.

+
Is Bluebird Streaming Speaker Diarization the only on-device streaming diarization engine?

Yes — in production. All major cloud streaming diarization APIs require audio to be sent to remote servers. NVIDIA Riva, SDK for building Speech AI applications, has introduced streaming diarization as an alpha feature with known issues, such as limited support for up to a maximum of 8 concurrent requests and the GPU requirement, making on-device deployment on mobile, embedded, or cost-sensitive devices impossible.

Open-source library, diart, supports streaming diarization with pyannote but is a research tool with no production SDK, no cross-platform support, and 500ms minimum latency.

Bluebird Streaming Speaker Diarization is the only production-ready, cross-platform, on-device streaming speaker diarization SDK.

+
How does Bluebird Streaming Speaker Diarization compare to Amazon Transcribe Streaming Diarization?

Amazon Transcribe supports streaming speaker diarization as a feature, but it is cloud-only and only supports US English (en-US) for real-time streaming diarization. Amazon's streaming speaker diarization accuracy also decreases reliably above five speakers in a stream. Speaker labels only appear for fully transcribed segments, not interim results.

Bluebird Streaming Speaker Diarization, on the other hand, processes audio entirely on-device in any language your ASR supports, handles an unlimited number of speakers, and returns speaker labels for interim results.

+
How does Bluebird Streaming Speaker Diarization compare to Azure Speech real-time diarization?

Azure Speech offers real-time diarization (generally available since May 2024) that supports up to 35 speakers. It is cloud-only. Azure's own documentation notes that speaker IDs appear as "Unknown" in early intermediate results while the service accumulates audio to build voice profiles.

Bluebird Streaming Speaker Diarization processes on-device with no cold-start "Unknown" period, no cloud round-trip, and no audio transmitted to Microsoft's servers.

+
How does Bluebird Streaming Speaker Diarization compare to Google Cloud Speech-to-Text diarization?

Google Cloud Speech-to-Text supports speaker diarization in streaming via the Chirp 3 model. It is cloud-only and available only in the global, US, and EU multi-regions — a geographic restriction that affects latency and data residency compliance. Google's diarization is bundled into their STT pipeline and cannot be used with a different transcription engine.

Bluebird Streaming Speaker Diarization runs on-device with no geographic restriction, no cloud dependency, and is decoupled from any specific transcription engine.

+
How does Bluebird Streaming Speaker Diarization integrate into a voice pipeline?

Bluebird Streaming Speaker Diarization runs with your streaming ASR engine. You feed the same audio frames to both Bluebird and your ASR. Bluebird returns a speaker index per frame while the ASR returns transcript text.

+
How does Bluebird Streaming Speaker Diarization differ from Falcon Speaker Diarization?

Falcon Speaker Diarization is Picovoice's batch diarization engine — designed for processing complete audio files and producing speaker-labelled transcripts with the highest possible accuracy and minimal compute requirements. Falcon Speaker Diarization possesses the ability to revise labels across the entire recording. Bluebird Streaming Speaker Diarization processes live audio streams and assigns speaker labels in real time as each utterance is spoken.

Use Falcon Speaker Diarization when accuracy on a completed recording is the priority. Use Bluebird Streaming Speaker Diarization when speaker identity is needed during the conversation.

+
Does Bluebird Streaming Speaker Diarization work offline?

Yes. Bluebird Streaming Speaker Diarization processes all audio on-device with no network connection required. It operates in air-gapped environments, remote field deployments, aircraft, underground facilities, and any environment where cloud APIs cannot reach or where data handling requirements prohibit audio transmission to third-party servers.

+
How many speakers can Bluebird Streaming Speaker Diarization handle?

Bluebird Streaming Speaker Diarization has no fixed speaker limit while cloud alternatives cap at 5 reliable speakers (Amazon), or 35 (Azure). This makes Bluebird Streaming Speaker Diarization a great fit for panel discussions, conference sessions, and any multi-party audio where the number of active speakers is unpredictable or large.

+
Is Bluebird Streaming Speaker Diarization GDPR, HIPAA, and CJIS compliant?

Yes. Bluebird Streaming Speaker Diarization processes audio entirely on-device without transmitting to a server. Picovoice cannot access end-user audio. Bluebird Streaming Speaker Diarization is compliant with GDPR, HIPAA, CCPA, and CJIS by architecture — not policy.

+
Which platforms does Bluebird Streaming Speaker Diarization support?

Bluebird Streaming Speaker Diarization supports Linux, macOS, Windows, Android, iOS, and Raspberry Pi. Native SDKs are available for << insert SDKs >>. Bluebird can be deployed on-device, on-premise, or in a private or public cloud — the deployment decision is yours, not Picovoice's.

+
How do I get technical support for Bluebird Streaming Speaker Diarization?

Picovoice docs, blog, Medium posts, and GitHub are great resources to learn about voice AI, Picovoice technology, and how to start with streaming speaker diarization. Enterprise customers get dedicated support specific to their applications from Picovoice Product & Engineering teams. Reach out to your Picovoice contact or talk to sales to discuss support options.