Rhino Speech-to-Intent

Add custom voice commands to anything

Unlimited voice interactions beating cloud NLU accuracy

Start Building for free
Click to activate
Click to activate

Design Context-Aware Voice User Interfaces

Design, test and train custom voice commands on Picovoice Console. Build models supporting intent classification and entity resolution with multiple slots. Beat cloud NLU accuracy with high margins by tuning into your domain of interest. Instantly download trained models for edge inference.

Start Building
Picovoice Console User Interface to build thousands of voice command expressions with intents, slots, and YAML files rapidly
Start Building

Build with Picovoice SDKs in a Few Lines of Code

Add custom voice commands with your favourite SDK, including Android, iOS, Python, Flutter, and React. Add only a few lines of code, and let the SDK handle audio capture and inference.

rhino = pvrhino.create(
access_key,
context_path)
while not rhino.process(audio_frame()):
pass
inference = rhino.get_inference()
Build with Python
let rhino = new Rhino(
accessKey,
contextPath);
while (!rhino.process(audioFrame())) { }
let inference = rhino.getInference();
Build with NodeJS
RhinoManager rhinoManager = new RhinoManager.Builder()
.setAccessKey(accessKey)
.setContextPath(contextPath)
.build(
appContext,
new RhinoManagerCallback() {
@Override
public void invoke(RhinoInference inference) {
// Inference callback
}
}
);
rhinoManager.start()
Build with Android
let rhinoManager = RhinoManager(
accessKey: accessKey,
contextPath: contextPath,
onInferenceCallback: { inference in
// Inference callback
});
try rhinoManager.start()
Build with iOS
const {
contextInfo,
isLoaded,
isListening,
isError,
isTalking,
errorMessage,
pushToTalk,
start,
pause,
stop,
} = useRhino(
RhinoWorkerFactory,
{
accessKey: accessKey,
context: context,
start: true
},
(rhinoInference) => {
// Inference callback
}
);
Build with React
RhinoManager rhinoManager = await RhinoManager.create(
accessKey,
contextPath,
(inference) => {
// Inference callback
});
await rhinoManager.process()
Build with Flutter
let rhinoManager = await RhinoManager.create(
accessKey,
contextPath,
(inference) => {
// Inference callback
});
await rhinoManager.process()
Build with React Native
RhinoManager rhinoManager = RhinoManager.Create(
accessKey,
contextPath,
(inference) => {
// Inference callback
});
rhinoManager.Start();
Build with Unity
constructor(private rhinoService: RhinoService) {
this.inferenceDetection = rhinoService.inference$.subscribe(
inference => {
// Inference callback
}
)
}
async ngOnInit() {
await this.rhinoService.init(
RhinoWorkerFactory,
{
accessKey: accessKey,
context: context
}
)
}
Build with Angular
<Rhino
ref="rhino"
v-bind:rhinoFactoryArgs="{
accessKey: accessKey,
context: context
}"
v-bind:rhinoFactory="factory"
v-on:rhn-inference="rhnInferenceFn"
/>
methods: {
rhnInferenceFn: function (inference) {
// Inference callback
}
}
Build with Vue
Rhino rhino = Rhino.Create(
accessKey,
contextPath);
while (rhino.Process(AudioFrame())) { }
Inference inference = rhino.GetInference();
Build with .NET
Rhino rhino = new Rhino.Builder()
.setAccessKey(accessKey)
.setContextPath(contextPath)
.build();
while (!rhino.process(audioFrame())) { }
RhinoInference inference = rhino.getInference();
Build with Java
rhino := NewRhino(
accessKey,
contextPath)
err := rhino.Init()
for {
isFinalized, err := rhino.Process(AudioFrame())
if isFinalized {
break
}
}
inference, err := rhino.GetInference()
Build with Go
let rhino: Rhino =
RhinoBuilder::new(
access_key,
context_path
)
.init()
.expect("");
loop {
if let Ok(is_finalized) = rhino.process(&audio_frame()) {
if is_finalized {
if let Ok(inference) = rhino.get_inference() {
// Inference callback
}
}
}
}
Build with Rust
pv_rhino_init(
access_key,
model_path,
context_path,
sensitivity,
require_endpoint,
&rhino);
while (true) {
pv_rhino_process(
rhino,
audio_frame(),
&is_finalized);
if (is_finalized) {
pv_rhino_get_intent(
rhino,
&intent,
&num_slots,
&slots,
&values);
}
}
Build with C

Deploy Unified Voice User Interfaces across Platforms

Offer seamless user experiences across all platforms. Deploy domain-specific custom voice AI models across all platforms, including embedded, mobile, web, on-premise, and cloud.

Start Building
Start Building

Grow your Company instead of Cloud Providers’

Grow product and user engagement with unlimited voice interactions without worrying about the cost. API-based pricing grows out of hand with user engagement, while Picovoice’s remain constant!

Start with the Free Tier

Why Rhino Speech-to-Intent?

Highly accurate — backed by data, not fancy slides

Choose the best engine based on data! Accuracy depends on various factors. In a market with numerous “the best engine”, we published an open-source benchmark. Compare Rhino against the most popular conversational AI engines, Amazon Lex,Google Dialogflow, IBM Watson, Microsoft LUIS, or any other Natural Language Understanding (NLU) engine. Rhino outperforms them across various accents and in the presence of noise and reverberations.

NLU accuracy comparison shows Rhino is more accurate than Amazon Lex, Google Dialogflow, IBM Watson & Microsoft LUIS.

Real-time — no network delay, no downtime and zero latency

Build real real-time experiences with Rhino. Rhino’s edge-first architecture infers intents from utterances directly with zero latency. Relying on the cloud APIs hinders user experience due to fluctuating latency or network performance. Milliseconds matter in many applications such as automotive, smart TV or metaverse.

Example voice command processed by Rhino. It infers Intents and slots from utterances locally on-device in real-time.

Private — intrinsically compliant with GDPR, HIPAA and more!

Ensure user privacy and stay compliant! Rhino processes voice commands locally on-device, without recording data and sending them to the cloud. Put Rhino in meeting rooms, warehouses or examination rooms, knowing that no one will ever have access to the conversations.

Multilingual - supports polyglot experiences.

Create polyglot experiences with Rhino Speech-to-Intent! Grow globally and train voice AI models in English, French, German, Italian, Japanese, Korean, Portuguese, Spanish, and more on the Picovoice Console. Every user still has access to unlimited voice interactions in all languages.

English

German
Deutsch

Spanish
Español

French
Français

Italian
Italiano

Japanese
日本語

Korean
한국어

Portuguese
Português

Mandarin
普通话

Dutch
Nederlands

Russian
Русский

Hindi
हिन्दी

Polish
Język polski

Vietnamese
Tiếng Việt

Swedish
Svenska

Arabic
اَلْعَرَبِيَّةُ

Use Cases

Search By Voice

Add voice for truly hands-free search experiences on the websites, mobile applications and devices.

Read More
Voice Search

Add voice search to mobile applications, websites, and devices. Find keywords and phrases in audio, video, and streams.

Read More
Speech Analytics

Transformative customer and employee experience with speech analytics and intelligence tools powered by the only end-to-end Voice AI platform.

Read More
Voice Command

Add voice commands to devices, mobile or web applications to elevate user experience.

Read More

Learn more about Rhino Speech-to-Intent Engine

  • Is Rhino a Natural Language Understanding (NLU) Engine?

    NLU engines infer intents and slots (entities) from speech transcribed by a speech-to-text engine. Rhino Speech-to-Intent understands the intention directly from the spoken utterance. We coined the term Speech-to-Intent when developing Rhino to indicate the end-to-end nature of its inference.

  • How does Rhino Speech-to-Intent achieve such high accuracy with small model sizes compared to other edge and cloud-based solutions in the market?

    The standard approach to intent inference (i.e. understanding voice commands) is to break it down into two tasks. First, a speech-to-text engine converts the spoken utterance into text. Then the transcription is processed by a natural language understanding (NLU) engine. The NLU engine is responsible for inferring the topic, intent, and slots. However, if the accuracy of the speech-to-text engine is not good, the output of NLU will be poor, too. Therefore, some solutions tune speech-to-text engines for the domain of interest to improve overall performance. This approach requires significant resources such as computing power, memory, and storage. When implemented as a cloud solution, this is not an issue. However, the cloud is not always the best option. Also, not every use case requires open-domain, millions of variants of spoken comments. One does not need to discuss the meaning of life with a coffee machine or a surgical robot. Most use cases have a confined domain (context) that covers thousands of spoken commands.

    Picovoice’s Speech-to-Intent engine is perfect for these use cases by fusing automated speech recognition and NLU engines tuned for the specific domain of interest. This end-to-end approach results in small and efficient model sizes with high accuracy.

  • How do I learn more about the terminology used for Natural Language Understanding (NLU) Engines?

    Intents, expressions, and slots are commonly used in conversational AI and across various engines such as Amazon Lex, IBM Watson, Google Dialogflow or Rasa NLU. They’re used to build voice assistants or bots. You can check out Picovoice Glossary to learn more or Rhino Syntax Cheat Sheet to start building contexts with intents, slots, macros and expressions.

  • How can I add custom commands to voice control mobile or web applications?

    Picovoice docs is a great source to learn how to add custom voice commands to Android and iOS applications and modern web browsers.

  • Does Rhino Speech-to-Intent process voice data locally on the device?

    Rhino processes voice data locally on the device. If you haven’t, try the voice-activated coffee maker demo offline. After allowing the microphone access, turn off your internet connection before running the demo. Rhino Speech-to-Intent directly infers intents from your utterances within your web browser.

  • Which platforms does Rhino Speech-to-Intent support?

    1. Microcontrollers: Arm Cortex-M, STM32, PSoC, Arduino, and i.MX RT
    2. Single Board Computers: Raspberry Pi, NVIDIA Jetson, and BeagleBone
    3. Mobile Applications: Android and iOS
    4. Web Browsers: Chrome, Safari, Firefox, and Edge
    5. Desktop and Servers: Linux, macOS, and Windows
  • What can I build with Rhino Speech-to-Intent?

    Picovoice customers use Speech-to-Intent in
    • Voice Assistants
    • IVRs
    • Customer Service Applications
    • Medical Devices
    • Enterprise Applications
    • Voice-activated Smart Thermostats
    • Smart TVs
    Almost any vertical can benefit from conversational AI. Rhino is even getting ready for space exploration. Don’t forget to check out use case pages, including Search by Voice and Voice Command and Control. If you’re looking for inspiration, check Picovoice's Youtube and Medium pages.
  • How do I get technical support for Rhino Speech-to-Intent?

    Picovoice docs, blog, Medium posts, and GitHub are great resources to learn about voice recognition, Picovoice engines, and how to start adding voice control to anything. Picovoice also offers GitHub community support to all Free Tier users.

  • What should I do if I need support for other languages?

    Reach out to Picovoice Sales by providing details about the opportunity, including use case, requirements and project details.

  • How can I get informed about the updates and upgrades?

    Version changes appear in the Picovoice Newsletter, LinkedIn, and Twitter. Subscribing to GitHub is the best way to get notified of the patch releases. If you enjoy building with Rhino, don’t forget to give it a star when you’re on GitHub!

Lead the Voice Revolution!

Build private, fast, cost-effective, cross-platform voice products
Talk to a Voice AI Expert