Screen calls in real time using streaming speech recognition, natural language understanding, and AI voice synthesis. Runs entirely on the device across mobile, embedded, and desktop.
On-device AI call screening automatically answers incoming calls, transcribes what the caller says in real time, understands their intent, and presents the phone owner with action options — with no audio ever sent to a cloud service. Picovoice's Cheetah Streaming Speech-to-Text, Rhino Speech-to-Intent, and Orca Text-to-Speech compose into a self-contained pipeline. All three run locally on the device.
Why Cheetah Streaming Speech-to-Text?
Lowest latency. Lowest compute. No accuracy tradeoff.
15%
fewer word errors than Google Streaming STT in English (10.1% vs 11.9% WER)
40x
less compute than Moonshine Medium while being more accurate (0.08 vs 3.36 core-hour)
1.6x
faster than Amazon Transcribe Streaming (590 ms vs 920 ms word emission latency)
Cheetah Streaming Speech-to-Text beats Google Cloud STT in word error rate and word emission latency across all tested languages, and outperforms Azure STT in several benchmarks, per open-source real-time transcription benchmark — even before it's customized for the use case. Cheetah requires less compute than any other local engine tested.
English Word Error Rate
Lower is better
Amazon Streaming5.6%
Azure Real-time8.2%
Cheetah Streaming10.1%
Moonshine Streaming Medium10.6%
Vosk Streaming Large11.5%
Google Streaming11.9%
Whisper.cpp Streaming Base19.8%
English Punctuation Error Rate
Lower is better
Cheetah Streaming16.1%
Azure Real-time16.4%
Amazon Streaming24.4%
Google Streaming36%
Moonshine Streaming Medium45.1%
Whisper.cpp Streaming Base54.1%
Why Rhino Speech-to-Intent?
End-to-end intent. No transcript. No hallucinations.
97.3%
Average Command acceptance accuracy (vs. 84.3% Amazon Lex and 77.3% Dialogflow)
6x
Higher accuracy than Big Tech average
5.5x
fewer errors than Dialogflow in high noise (94% vs 67% at 6 dB SNR)
Most voice command systems run a two-step pipeline: speech-to-text converts audio to a transcript, then a separate NLU model parses that transcript for intent. Every step accumulates error and compounds latency. Rhino Speech-to-Intent is an end-to-end engine with a single model that maps spoken audio directly to a structured intent with typed slot values. Higher accuracy even in noisy environments. No hallucinations. No intermediate transcript.
Voice Command Acceptance Accuracy
Higher is better
Rhino97.3%
Amazon Lex84.3%
Google Dialogflow77.3%
Voice Command Acceptance Accuracy at 21 dB SNR
Higher is better
Rhino99%
Amazon Lex87%
Google Dialogflow83%
Why Orca Text-to-Speech?
Natural-sounding TTS at 29 MB peak memory.
2.6x
faster than ElevenLabs Streaming (128 ms vs 335 ms first-token-to-speech)
11x
less memory than the lightest on-device alternative (29 MB vs 320 MB Kitten TTS Nano)
2.3x
less CPU than most compute-efficient neural on-device TTS (0.16x vs 0.37x Pocket TTS)
Most high-quality TTS solutions require hundreds of megabytes of RAM. Orca TTS uses 29 MB peak memory, 10–50× less than any other on-device alternative, except for ESpeak. This makes Orca the only natural-sounding TTS deployable in any environment, including browser tabs, mobile apps with strict out-of-memory limits, and embedded devices.
TTS Latency
Lower is better
Orca TTS Streaming128 ms
ElevenLabs TTS Streaming335 ms
ESpeak TTS1,430 ms
ElevenLabs TTS1,470 ms
Audio Quality
Listen and compare — grouped by peak memory usage.
Peak Memory Usage < 30 MB
ESpeak
Orca
Built for enterprise applications
From mobile OEMs to embedded hardware
Mobile OEMs and device makers
Ship beyond Pixel and iPhone
Mobile OEMs and device manufacturers can ship the same or even more advanced capabilities compared to Google Pixel Call Screening and Apple's native call screening. Picovoice SDKs run on the end-user device. No backend to operate, no privacy agreements to sign. Just a competitive edge over Big Tech.
Telcos and carriers
Beyond STIR/SHAKEN
On-device AI call screening goes beyond STIR/SHAKEN, which only validates the caller ID, and understands what the caller actually says, allowing users to classify their intent in real time. Deploy it as a differentiated feature in your dialer app, or as an upgrade to legacy IVR trees that still cost you DTMF licensing fees.
Healthcare, legal, FSI
Compliance without the friction
Healthcare, legal, and FSI teams can improve employee productivity without extra security steps. Keeping caller audio on-device eliminates an entire category of compliance obligation: no BAA for voice data, no processing agreement with a cloud provider, no breach surface through Picovoice systems.
Embedded hardware
Smart intercoms and access control
Smart intercoms and access control panels can screen visitor voice queries directly on embedded devices such as Raspberry Pi. No latency from network hops, no dependency on external uptime, no breach risks that contradict the premise of a physical security application. Full call screening capability with no cloud footprint.
Get started
On-device AI call screen code example
A complete working recipe in Python. Open-source on GitHub. Runs 100% on-device.
recipe · on-device-ai-call-screening
Difficulty
Beginner
Runtime
100% on-device
Language
Python
Platforms supported
AndroidiOSLinuxmacOSWindowsChromeEdgeFirefoxSafariRaspberry Pi
These instructions assume your current working directory is recipes/call-screen/python.
1
Create a virtual environment
Isolate the recipe's dependencies from your system Python.
2
Activate the virtual environment
Activation makes pip install into .venv instead of system Python.
Linux, macOS, or Raspberry Pi
Windows
3
Install dependencies
Pulls in the Cheetah, Rhino, and Orca Python SDKs along with audio I/O.
4
Train the Speech-to-Intent model
Open the Picovoice Console, go to Rhino Speech-to-Intent, create an empty context, and import the Rhino context YAML for this recipe. Download the generated .rhn file for your target platform.
5
Run the AI Call Screening demo
Pass your AccessKey and the path to the .rhn file you just downloaded.
AI call screening automatically answers an incoming call, transcribes what the caller says, understands their intent, and presents the recipient with action options — without picking up. On-device AI call screening runs entirely on the device with no audio sent to the cloud, meaning it works offline and no caller data is ever transmitted.
+
How does call screening work on non-Pixel Android phones?
Google's Call Screen is exclusive to Pixel devices and depends on Google's cloud. Picovoice provides Android SDKs that any manufacturer or developer can embed. No dependency on Google services. No cloud round-trip.
+
Can I add call screening to an iOS app?
Yes. Picovoice SDKs support iOS natively. VoIP apps and business phone apps such as Grasshopper, OpenPhone, or Dialpad — and private healthcare communication apps — can use Cheetah Streaming Speech-to-Text, Rhino Speech-to-Intent, and Orca Text-to-Speech to add an on-device call screen even if Apple doesn't share its infrastructure with them.
+
Does on-device call screening work without an internet connection?
How is the on-device AI call screen app different from Google Pixel Call Screen?
Pixel Call Screen is proprietary, cloud-dependent, and Pixel-only. Picovoice's pipeline runs entirely on-device using licensable SDKs that work on Android, iOS, Linux, Raspberry Pi, and embedded hardware — with no Google dependency.
+
Can this on-device AI call screen app replace a traditional IVR system?
Yes. Traditional IVR relies on DTMF tones and rigid menu trees. Rhino Speech-to-Intent understands natural spoken phrases and maps them to structured intents without requiring exact phrasing — a conversational IVR replacement that runs on-device with no telephony cloud backend.
+
How is this on-device AI call screen app different from STIR/SHAKEN?
STIR/SHAKEN authenticates caller ID at the carrier level to reduce spoofing. On-device AI call screening is a complementary application-layer capability that operates after the call connects — understanding what the caller says and inferring their intent. The two approaches address different parts of the spam call problem and can be deployed together.
+
Does the on-device AI call screen app store or transmit audio anywhere?
No. Caller and phone owner audio is processed in memory on the device and discarded. It is never transmitted to Picovoice or any third-party cloud. Picovoice has no data controller relationship with your end users, which removes cloud voice data compliance obligations, including BAAs under HIPAA.