Rhino Speech-to-Intent

Add custom voice commands to any software with zero latency.

Natural Language Understanding engine fused with speech-to-text, beating cloud API accuracy

Click to activate

Trusted by thousands of enterprises - from startups to Fortune 500s

Loved by 200,000+ developers

What is Rhino Speech-to-Intent?

Rhino Speech-to-Intent infers user intents from utterances, allowing users to interact with applications via voice.

Rhino Speech-to-Intent understands complex voice commands, such as “find the maintenance checklist for Boeing 707” or “call 987 655 4433”.

Build useful voice assistants that run anywhere

o = pvrhino.create(
  access_key,
  context_path)

while not o.process(audio()):
  pass

inference = o.get_inference()
Build with Python

Why Rhino Speech-to-Intent?

Cloud-dependent conventional methods use generic automatic speech recognition (ASR) and natural language understanding (NLU) engines, resulting in subpar accuracy and unreliable response time.

Rhino Speech-to-Intent, fusing ASR and NLU engines, is six times more accurate than Big Tech NLU APIs, enabling elevated user experience and productivity.

Use-case-specific voice commands in real-time with high accuracy

Improve productivity with custom voice commands that actually work

Cloud ASR & NLU APIs

👍
84% accuracy on average
🐢
Unpredictable response time
👂
3rd party data sharing
☁️
Cloud-dependent

Rhino Speech-to-Intent

🚀
97%+ accuracy
⚡
Guaranteed response time
🔒
Private by design
🤸
Platform-agnostic

97%+ accuracy

Six times more accurate than cloud providers

Choose the best solution based on data. The open-source natural language understanding benchmark shows that Rhino Speech-to-Intent outperforms cloud conversational AI engines across various accents and in the presence of noise and reverberation.

Guaranteed response time

Real-time - no network delay, no downtime

Build “real” real-time experiences with Rhino Speech-to-Intent. Processing voice commands in the cloud hinders user experience due to fluctuating latency or network performance. Rhino Speech-to-Intent does not send voice commands to a 3rd party cloud and processes them directly on-device.

Click to activate

Privacy by design

Private — CCPA, GDPR, and HIPAA-compliant voice commands

Ensure user privacy and stay compliant! Rhino Speech-to-Intent processes voice commands locally on the device without recording data and sending them to the cloud. Enterprises can confidently put Rhino Speech-to-Intent in meeting rooms, warehouses, examination rooms, or call centers.

Platform-agnostic

Cross Platform - unified experiences anywhere!

Process voice data on all platforms and offer seamless user experiences. Rhino Speech-to-Intent runs across platforms, including microcontrollers, embedded, mobile, web, on-premise, and cloud.

Get started with

Rhino Speech-to-Intent

The best way to see how Rhino Speech-to-Intent differs from other natural language understanding solutions is to try it!

Start Now

Forever Free Plan

Custom Voice Commands
Platform-optimized model training
Intuitive SDKs
Unlimited interactions per user
Arabic, Dutch, English, Farsi, French, German, Hindi, Italian, Japanese, Korean, Mandarin, Polish, Portuguese, Russian, Spanish, Swedish, and Vietnamese

Voice Content Moderation with AI

Open-source Natural Language Understanding Datasets

Best Speech to Text for Voice Assistants

Multilingual Voice User Interfaces and Assistants

3 Challenges of Call Centers Implementing Speech Analytics

3 Tips for Implementing Voice AI in Contact Centres

What is Natural Language Understanding (NLU)?

Natural language understanding focuses on the meaning, i.e., comprehending users’ intent. Researchers initially concentrated on understanding the text and understanding speech is a relatively new field. While spoken language understanding is a more specific term to refer to it, many people, including the industry and researchers, still use natural language understanding for capturing intents from utterances, mainly because the conventional approach is to run speech-to-text and natural language understanding engines subsequently.

What is intent detection?

Intent Detection is a subtask of natural language processing and a critical component of any task-oriented system. Natural language understanding solutions match users' utterances with one of the predefined classes by understanding the user’s goal (i.e., intention). After matching utterances with intents, the software can initiate a task to achieve users’ goals. For example, users with the intention to turn the lights off may say: “Turn the lights off.”, “Switch off the lights.”, “Can you please turn the lights off?”. Intent detection captures the users’ goal: “change the state of the lights from on to off” despite the different ways to communicate it.

Can I use Rhino Speech-to-Intent to overcome the limitations of Amazon Lex or other NLU engines?

Rhino Speech-to-Intent is a more accurate, resource-efficient, and faster alternative to Amazon Lex, Google DialogFlow, or other NLU engines for use-case-specific intent detection. Picovoice offers a Free Plan to enable experimentation to overcome various challenges. However, if you’re still not sure how to overcome the limitations of Amazon Lex, Google DialogFlow, and other NLU engines with Rhino Speech-to-Intent or need help with migration, leverage Picovoice’s Consulting Services!

How does Rhino Speech-to-Intent differ from Natural Language Understanding (NLU) solutions such as Amazon Lex, Google DialogFlow, IBM Watson Natural Language Understanding, or Microsoft LUIS??

Rhino Speech-to-Intent -as the name suggests, converts speech into intent directly without relying on text, obviating the need for automatic speech recognition. Rhino Speech-to-Intent uses the modern end-to-end approach to infer intents and intent details directly from spoken commands, enabling developers to train jointly-optimized automatic speech recognition (ASR) and natural language understanding (NLU) engines for their domain of interest. Rhino Speech-to-Intent specializes in use-case-specific applications, not open-domain applications with billion of spoken command variations. For example, one does not need to discuss the meaning of life with a coffee machine or a surgical robot. Most use cases have a confined domain (context) that covers hundreds or thousands of spoken commands. With use-case-specific and platform-optimized voice AI models, Rhino Speech-to-Intent offers high accuracy with minimal resources.

How do I learn more about the terminology used for Natural Language Understanding (NLU) Engines?

Intents, expressions, and slots are commonly used in conversational AI and across various engines such as Amazon Lex, IBM Watson, Google Dialogflow, or Rasa NLU. They’re used to build voice assistants or bots. You can check out Rhino Speech-to-Intent Syntax Cheat Sheet to start building or Picovoice Glossary to learn the terminology.

Does Rhino Speech-to-Intent process voice data locally on the device?

Rhino Speech-to-Intent processes voice data locally on the device.

Which platforms does Rhino Speech-to-Intent support?

Single Board Computers: Raspberry Pi, NVIDIA Jetson, and BeagleBone
Desktop and Servers: Linux, macOS, and Windows
Mobile Devices: Android and iOS
Web Browsers: Chrome, Safari, Edge, and Firefox
Microcontrollers: Arm Cortex-M, STM32, Arduino, and i.MX RT

How do I get technical support for Rhino Speech-to-Intent?

Picovoice docs, blog, Medium posts, and GitHub are great resources to learn about voice AI, Picovoice technology, and how to start building with voice commands. You can report bugs and issues on GitHub. If you need help with developing your product, you can purchase the optional Support Add-on or upgrade your account to the Developer Plan.

Which languages does Rhino Speech-to-Intent support?

Rhino Speech-to-Intent supports Arabic, Dutch, English, Farsi, French, German, Hindi, Italian, Japanese, Korean, Mandarin, Polish, Portuguese, Russian, Spanish, Swedish, and Vietnamese.

What should I do if I need support for other languages?

Reach out to Picovoice Consulting team to get a custom language model trained for your use case.

How can I get informed about updates and upgrades?

Version changes appear in the and LinkedIn. Subscribing to GitHub is the best way to get notified of patch releases. If you enjoy building with Rhino Speech-to-Intent, show it by giving a GitHub star!

Add custom voice commands to any software with zero latency.

What is Rhino Speech-to-Intent?

Build useful voice assistants that run anywhere

Why Rhino Speech-to-Intent?

Use-case-specific voice commands in real-time with high accuracy

Cloud ASR & NLU APIs

Rhino Speech-to-Intent

Six times more accurate than cloud providers

Real-time - no network delay, no downtime

Private — CCPA, GDPR, and HIPAA-compliant voice commands

Cross Platform - unified experiences anywhere!

Rhino Speech-to-Intent

More from Picovoice

Voice Content Moderation with AI

Open-source Natural Language Understanding Datasets

Best Speech to Text for Voice Assistants

Multilingual Voice User Interfaces and Assistants

3 Challenges of Call Centers Implementing Speech Analytics

3 Tips for Implementing Voice AI in Contact Centres

Rhino Speech-to-Intent

What is Natural Language Understanding (NLU)?

What is intent detection?

Can I use Rhino Speech-to-Intent to overcome the limitations of Amazon Lex or other NLU engines?

How does Rhino Speech-to-Intent differ from Natural Language Understanding (NLU) solutions such as Amazon Lex, Google DialogFlow, IBM Watson Natural Language Understanding, or Microsoft LUIS??

How do I learn more about the terminology used for Natural Language Understanding (NLU) Engines?

Does Rhino Speech-to-Intent process voice data locally on the device?

Which platforms does Rhino Speech-to-Intent support?

How do I get technical support for Rhino Speech-to-Intent?

Which languages does Rhino Speech-to-Intent support?

What should I do if I need support for other languages?

How can I get informed about updates and upgrades?