Bat Spoken Language Identification

Ultra-fast, high-accuracy language detection for real-time voice AI pipelines

Identify spoken language in 2 seconds and route audio to the right ASR, translation, or multilingual voice AI pipeline. Achieving 2× fewer errors than SpeechBrain LangID using 62× less memory.

Detected Spoken Language
🇺🇸English
0%
🇫🇷French
0%
🇪🇸Spanish
0%
🇮🇹Italian
0%
🇩🇪German
0%
🇵🇹Portuguese
0%
🇯🇵Japanese
0%
🇰🇷Korean
0%
Unknown
0%
Click to activate
93%
Accuracy, 2× fewer errors than SpeechBrain LangID
5 MB
Peak memory - 62x less than SpeechBrain LangID
0.004x
Core-hour ratio - 9x less than SpeechBrain LangID
What is Bat Spoken Language Identification?

Only on-device production-ready spoken language detection

Bat is an enterprise-ready on-device spoken language identification engine built for real-time multilingual voice AI pipelines. It detects the spoken language in live audio streams with high accuracy, runs entirely offline across platforms, and is private by architecture.

Bat, with its 4 MB model size, efficiently identifies the spoken language in real-time audio, using 5 MB of peak memory, and can run on any platform. While Bat Spoken Language Identification is optimized for streaming audio and detects languages in 2 seconds, it supports asynchronous processing too.

Bat Spoken Language Identification's modularity enables several applications, including live translation, multilingual voice AI agents and assistants, speech analysis and archiving by categorizing voice notes, conference recordings, and digital media libraries.

Developer Experience

Language detection in a few lines of code

Bat Spoken Language Identification returns the detected language and a confidence score for each inference. Drop it at the start of your audio pipeline to route audio files to the right ASR model, translation engine, or category. Use Bat Spoken Language Identification with its native SDKs for Python, C, iOS, Android, and Web.

OPEN-SOURCE SPOKEN LANGUAGE IDENTIFICATION BENCHMARK

Spoken Language Identification with proven accuracy and efficiency

Bat Spoken Language Identification achieves 93% accuracy on VoxLingua107 under open-set evaluation: 2× fewer errors than SpeechBrain LID (7% vs. 15% miss rate) while using 62× less memory and 9× less CPU.

Accuracy - higher is better
Bat Spoken LID93%
SpeechBrain LID85%
Peak Memory Usage — lower is better
Bat Spoken LID5 MB
SpeechBrain LID333 MB
CPU Core Hour Ratio - lower is better
Bat Spoken LID0.004x
SpeechBrain LID0.039x
Model Size - lower is better
Bat Spoken LID4 MB
SpeechBrain LID85 MB
Ready to integrate? Check our docs to start building or talk to the sales team about enterprise deployment.
Capabilities

Why enterprises choose Bat Spoken Language Identification

Bat is an enterprise-ready on-device spoken language identification engine built for real-time audio streams, high accuracy, and minimal resource footprint. It identifies spoken languages live as audio is captured, runs entirely offline across platforms, and is private by architecture.

01Production ReadyAll big tech transcription APIs, Amazon Transcribe, Azure STT, and Google Cloud STT, offer streaming language identification features, but with limitations: candidate language lists are required upfront, geographic and model restrictions apply, and audio must leave the device on every inference. Other transcription APIs, such as Deepgram and Rev, offer language identification only for pre-recorded audio files, not streaming. SpeechBrain can process live audio but uses 62x more RAM and 9x more CPU, offering no production SDK or enterprise support. Bat is the only engine that identifies spoken languages in real-time audio streams entirely on-device, no cloud round-trip, no candidate list constraints.
02Highly AccurateSome language identification benchmarks test only on languages the model was trained to recognise. Bat Spoken Language Identification is evaluated under open-set conditions, including audio from unsupported languages in the test. This reflects what your production system will actually encounter. Bat achieves 93% accuracy, half the error rate of SpeechBrain (7% vs. 15% miss rate).
Accuracy - higher is better
Bat Spoken LID93%
SpeechBrain LID85%
Miss Rate - lower is better
Bat Spoken LID7%
SpeechBrain LID15%
* Benchmarked on VoxLingua107. SpeechBrain also uses it in training its language ID model. Picovoice Bat is trained on proprietary data. Since SpeechBrain is tested on the training data, SpeechBrain's accuracy may be higher than it would be on unseen data.
035 MB RAM — runs anywhereLanguage identification is mostly used as a component in a larger pipeline. It needs to leave headroom for the ASR model, the NLU engine, and the application logic. SpeechBrain's 333 MB RAM requirement consumes the entire memory budget of most embedded devices before the pipeline even begins. Bat uses 5 MB at peak, 62× less than SpeechBrain, making it a practical first step in any voice pipeline regardless of deployment environment, including browser tabs, mobile apps, Raspberry Pi, and bare-metal embedded systems.
04Pipeline-ready outputBat Spoken Language Identification returns a language code and confidence score per audio frame, designed to integrate directly into the decision layer of a multilingual voice pipeline. Use Bat Spoken LID to select the correct language-specific ASR model, trigger translation, choose the right TTS voice, or adapt a voice assistant's language in real time.
05Private by architectureAudio is processed entirely on-device. No audio data is transmitted to any server. GDPR, HIPAA, CCPA, and CJIS compliant by architecture — not policy. Picovoice cannot access end-user audio. For healthcare, law enforcement, and public safety applications, body cameras, interview recording systems, dispatch, and in-field translation pipelines, on-device language identification means audio captured in the field never reaches a cloud server, satisfying the data handling requirements that eliminate cloud-dependent alternatives from consideration entirely.
Detected Spoken Language
🇺🇸English
0%
🇫🇷French
0%
🇪🇸Spanish
0%
🇮🇹Italian
0%
🇩🇪German
0%
🇵🇹Portuguese
0%
🇯🇵Japanese
0%
🇰🇷Korean
0%
Unknown
0%
Click to activate
06Cross-PlatformBat Spoken Language Identification runs on every platform your product ships — Android, Chrome, Edge, Firefox, iOS, Linux, macOS, Raspberry Pi, Safari, and Windows — across AMD, Intel, NVIDIA, and Qualcomm hardware.
07Enterprise ReadyBat Spoken Language Identification is production-grade and enterprise-ready. Picovoice offers flexible licensing, dedicated engineering support, NDA-protected custom model training, and SLA-backed response times for teams shipping at scale.

Ship it.
On device.

Fast, accurate, and lightweight real-time language identification

FAQ

Common questions about spoken language identification

+
What is spoken language identification?

Spoken language identification is the task of automatically detecting which language is being spoken in an audio stream or file. It is a foundational component in multilingual voice AI systems — used to route audio to the correct language-specific speech recognizer, adapt voice assistants to a user's language in real time, enable automatic transcription of multilingual audio, and support translation pipelines in public safety, healthcare, and customer service applications. Bat Spoken Language Identification processes audio frames on-device and returns a language code and confidence score in real time, making it a practical first stage for any multilingual voice pipeline.

+
Does Bat Spoken Language Identification support real-time streaming?

Yes. Bat Spoken Language Identification is the only production-ready on-device engine that does. Amazon Transcribe, Azure, and Google Cloud Speech-to-Text all offer streaming language identification, but all three are cloud-only; audio must leave the device on every inference. Others, like Deepgram and Rev AI, are batch async only. SpeechBrain can process live audio but uses 62x more RAM and 9x more CPU than Bat, offering no production SDK or enterprise support with a committed SLA. Bat Spoken Language Identification identifies spoken languages in real-time audio streams entirely on-device — no cloud round-trip, no audio transmitted to any server, and with intentional open-set handling that returns "unknown" for unsupported languages rather than forcing a match.

+
How does Bat Spoken Language Identification compare to Amazon Transcribe Language Identification?

Amazon Transcribe supports streaming language identification and requires a minimum of one second of speech before identifying. Like Bat Spoken Language Identification, it can return results for live audio — but it is cloud-only, meaning every audio frame is transmitted to Amazon's servers for processing. Amazon Transcribe requires developers to provide a candidate language list upfront, and does not allow them to combine LID with custom language models or redaction. It selects the closest candidate match when the spoken language is not in your list, rather than returning "unknown." Bat Spoken Language Identification identifies spoken languages on-device with no audio transmitted to the cloud.

+
How does Bat Spoken Language Identification compare to Azure Speech language identification?

Azure Speech offers both at-start and continuous language identification for live audio streams. At the start, language detection takes up to 5 seconds, as acknowledged in Microsoft's own documentation. Continuous language identification is cloud-only, still in preview for on-premise containers. Azure can return a NoMatch or Unknown result based on confidence thresholds and audio conditions. Bat Spoken Language Identification processes audio on-device with no cloud dependency, works across all supported SDKs, and handles unsupported languages through an intentional open-set protocol that was benchmarked and evaluated, not an edge case fallback.

+
How does Bat Spoken Language Identification compare to Google Cloud Speech-to-Text language identification?

Google Cloud Speech-to-Text supports language identification in streaming audio, but requires a primary language code to be set, as language detection is not fully automatic and performs candidate selection across up to four languages total. It is only available in the global region and US/EU multi-regions, only works with specific models (long, short, and telephony), and is cloud-only with no on-device option. Like other cloud alternatives, Google selects the best match from the provided candidate list and is not designed with open-set handling for unsupported languages. Bat Spoken Language Identification runs entirely on-device, supports any audio source without geographic or model restrictions, and handles unsupported languages deliberately rather than a best-guess candidate match.

+
How does Bat Spoken Language Identification compare to SpeechBrain Language ID?

In Picovoice's Open-source Spoken Language Identification Benchmark on VoxLingua107 under open-set evaluation, Bat Spoken Language Identification achieves 93% accuracy versus SpeechBrain LID's 85% — more than 2× fewer identification errors (7% vs. 15% miss rate). On efficiency, Bat Spoken Language Identification uses 5 MB peak memory versus SpeechBrain LID's 333 MB (62× less), a 4MB model versus SpeechBrain LID's 85MB (21× smaller), and requires 0.004 core-hours per hour of audio versus SpeechBrain LID's 0.039 (9× less compute). SpeechBrain LID can process live audio, but not everywhere due to its computational requirements. In resource-constrained environments, SpeechBrain LID leaves no headroom for the rest of the voice pipeline.

SpeechBrain LID has no production SDK, no native mobile or embedded deployment support, and no enterprise backing. Bat Spoken Language Identification is also evaluated on genuinely unseen data; SpeechBrain LID is partially tested on its own training dataset, VoxLingua107 - the most popular language identification dataset. Thus, the real-world accuracy figures of SpeechBrain LID may be overstated.

+
How does on-device spoken language identification differ from language detection in cloud STT APIs?

Cloud speech-to-text APIs, like Amazon Transcribe, Azure Speech, and Google Cloud Speech-to-Text, offer language detection as a feature within their transcription pipeline. Developers must pass a parameter when submitting audio, and then STT attempts to detect the language alongside transcription. This approach has three core limitations: it is optimised for transcription rather than the language identification task specifically, it requires audio to be sent to a cloud server, and it relies on candidate language lists rather than open-set detection. Bat Spoken Language Identification is a dedicated standalone engine, running on live audio streams before transcription begins, processes audio entirely on-device with no cloud dependency, and is built specifically for language identification with intentional open-set handling for unsupported languages.

+
What is open-set evaluation, and why does it matter for language identification?

Closed-set evaluation tests a language identification engine only on the languages it was trained to recognize. Open-set evaluation includes audio from languages outside the supported set and requires the engine to return "unknown" for those inputs rather than misclassifying them.

Picovoice Open-source Spoken Language Identification Benchmark uses open-set evaluation: correctly identifying unsupported audio as unknown counts as correct, and misclassifying it as a supported language counts as an error. This reflects production conditions, where users will inevitably speak languages outside the model's training set. Cloud alternatives like Amazon Transcribe and Google Cloud Speech-to-Text select the closest candidate match rather than returning "unknown" — meaning they can silently route audio to the wrong handler when the spoken language is not in the candidate list.

+
What is a CPU core-hour ratio, and why does it matter for spoken language identification?

The CPU core-hour ratio measures how many CPU core-hours are required to process one hour of audio. A ratio above 1.0 means the engine cannot keep up with real-time audio on a single core. A ratio below 1.0 means the engine processes faster than real time using less than one full CPU core. In practical terms, the cour ratio determines the difference between a viable pipeline component and a bottleneck.

Measured on AMD Ryzen 9 5900X (12 cores @ 3.70GHz), Bat Spoken Language Identification's ratio is 0.004 vs. SpeechBrain LID's core-hour ratio is 0.039. Bat uses 0.44% of a single CPU core in real time, 9× less than SpeechBrain's 3.9%, leaving the rest of the processing capacity free for downstream tasks like transcription, NLU, and application logic. This 9x difference becomes a critical factor for applications running on resource-constrained environments, such as mobile and embedded, Raspberry Pi.

+
Can Bat Spoken Language Identification run on embedded and resource-constrained devices?

Yes. Bat Spoken Language Identification requires only 5 MB of peak memory during processing, making it suitable for any deployment environment and applications with tight memory budgets, including embedded systems, low-end mobile devices, Raspberry Pi, and web browsers.

For context, SpeechBrain Language ID requires 333 MB of peak memory — 62× more than Bat, which exceeds the total available memory headroom on many embedded and low-power devices before any other application logic runs. Language identification is typically one component in a larger voice pipeline alongside ASR, NLU, and application logic. Using 5 MB of memory at peak, Bat Spoken Language Identification leaves the rest of your memory budget intact for the pipeline components that need it. Cloud-based language identification APIs have no equivalent embedded deployment option — audio must always be transmitted to remote servers for processing.

Note: Memory availability is not the same as total device RAM. Background services (SSH, networking, logging) consume memory before any application starts. As a practical guideline, a LID engine used in a real-time voice AI application should be treated as part of the total app memory budget, which on mobile typically should not exceed 150–200MB on low-end devices to avoid out-of-memory (OOM) termination risk. Both Android and iOS use low-memory killers that terminate processes when free memory falls below a threshold, and apps consuming more memory are killed first.

+
How does Bat Spoken Language Identification integrate into a multilingual voice pipeline?

Bat Spoken Language Identification returns a language code and confidence score per audio frame. In a typical multilingual voice pipeline, Bat runs first — before transcription — and its output is used to select the appropriate language-specific ASR model, translation engine, or voice handler. For example, a French detection routes audio to Cheetah Streaming Speech-to-Text's French model; Spanish routes to the Spanish model. Combined with Zebra Translate, Bat Spoken Language Identification enables translation applications. Combined with picoLLM On-device LLM, and/or Rhino Speech-to-Intent, Bat Spoken Language Identification allows developers to create multilingual voice AI agents and assistants.

+
Does Bat Spoken Language Identification work offline?

Yes. Bat Spoken Language Identification processes all audio on-device with no network connection required. It operates in remote deployments, and bandwidth-constrained hardware — factory floors, aircraft, underground facilities, and law enforcement field deployments where cloud APIs cannot reach or where data handling requirements prohibit audio transmission to third-party servers.

+
Which languages does Bat Spoken Language Identification support?

Bat spoken language identification currently supports English, French, Spanish, Italian, German, Portuguese, Japanese, and Korean.

For languages outside the supported set, Bat Spoken Language Identification returns "unknown" rather than forcing a match, ensuring your pipeline receives a reliable signal rather than a confident wrong answer.

+
Is Bat Spoken Language Identification GDPR, HIPAA, and CJIS compliant?

Yes. Audio is processed entirely on-device and never transmitted to any server. Bat Spoken Language Identification is compliant with GDPR, HIPAA, CCPA, CJIS, and other data residency regulations by architecture, not policy. No data processing agreements are required, and there is no risk of audio data breach through Picovoice's systems. Picovoice cannot access end-user audio.

+
Which platforms does Bat Spoken Language Identification support?

Bat Spoken Language Identification supports embedded, mobile, web, desktop, and server across Linux, macOS, Windows, Android, iOS, and Raspberry Pi. Native SDKs are available for Python, C, iOS, Android, and Web.

Bat Spoken Language Identification can be deployed on-device, on-premise, or in a private or public cloud. The deployment decision and user data are yours, not Picovoice's.

+
How do I get technical support for Bat Spoken Language Identification?

Picovoice docs, blog, Medium posts, and GitHub are great resources to learn about voice AI, Picovoice technology, and how to start building language detection products. Enterprise customers get dedicated support specific to their applications from Picovoice Product & Engineering teams. Reach out to your Picovoice contact or talk to sales to discuss support options.

+
How can I get informed about updates and upgrades to Bat Spoken Language Identification?

Version changes appear in the and LinkedIn. Subscribing to GitHub is the best way to get notified of patch releases. If you enjoy building with Bat Spoken Language Identification, show it by giving a GitHub star!