Rhino Speech-to-Intent

Add custom voice commands to any software with zero latency.

Intent detection engine fuses Natural Language Understanding with Speech-to-Text, outperforming cloud APIs

Click to activate

What is Rhino Speech-to-Intent?

Rhino Speech-to-Intent infers user intents from utterances, allowing users to interact with applications via voice.

Rhino Speech-to-Intent understands complex voice commands, such as “find the maintenance checklist for Boeing 707” or “call 987 655 4433”.

Build useful voice assistants that run anywhere

o = pvrhino.create(
access_key,
context_path)
while not o.process(audio()):
pass
inference = o.get_inference()

Why Rhino Speech-to-Intent?

Cloud-dependent conventional methods translate voice to text using generic automatic speech recognition (ASR), then detect user intent by analyzing text using natural language understanding (NLU). Processing voice data in two phases decreases accuracy and increases latency.

Rhino Speech-to-Intent, fusing ASR and NLU engines, does not rely on text representation to infer user intent, achieving six times higher accurate than Big Tech NLU APIs and enabling elevated user experiences.

Use-case-specific voice commands in real-time with high accuracy

Improve productivity with custom voice commands that actually work

Cloud ASR & NLU APIs

  • 👍
    84% accuracy on average
  • 🐢
    Unpredictable response time
  • 👂
    3rd party data sharing
  • ☁️
    Cloud-dependent

Rhino Speech-to-Intent

  • 🚀
    97%+ accuracy
  • Guaranteed response time
  • 🔒
    Private by design
  • 🤸
    Platform-agnostic
97%+ accuracy

Six times more accurate than cloud providers

Choose the best solution based on data. The open-source natural language understanding benchmark shows that Rhino Speech-to-Intent outperforms cloud conversational AI engines across various accents and in the presence of noise and reverberation.
Guaranteed response time

Real-time - no network delay, no downtime

Build “real” real-time experiences with Rhino Speech-to-Intent. Processing voice commands in the cloud hinders user experience due to fluctuating latency or network performance. Rhino Speech-to-Intent does not send voice commands to a 3rd party cloud and processes them directly on-device.
Click to activate
Privacy by design

Private — CCPA, GDPR, and HIPAA-compliant voice commands

Ensure user privacy and stay compliant! Rhino Speech-to-Intent processes voice commands locally on the device without recording data and sending them to the cloud. Enterprises can confidently put Rhino Speech-to-Intent in meeting rooms, warehouses, examination rooms, or call centers.
Platform-agnostic

Cross Platform - unified experiences anywhere!

Process voice data on all platforms and offer seamless user experiences. Rhino Speech-to-Intent runs across platforms, including microcontrollers, embedded, mobile, web, on-premise, and cloud.
Get started with

Rhino Speech-to-Intent

The best way to see how Rhino Speech-to-Intent differs from other natural language understanding solutions is to try it!

Start Now
Forever Free Plan
  • Custom Voice Commands
  • Platform-optimized model training
  • Intuitive SDKs
  • Unlimited interactions per user
  • Arabic, Dutch, English, Farsi, French, German, Hindi, Italian, Japanese, Korean, Mandarin, Polish, Portuguese, Russian, Spanish, Swedish, and Vietnamese
Learn more about

Rhino Speech-to-Intent

What is Natural Language Understanding (NLU)?

Natural language understanding deals with meaning, i.e., comprehending users’ intent. Researchers initially started with understanding user intents from the text. While spoken language understanding is a more specific term to refer to understanding user intent from speech, many people, including the industry and researchers, still use natural language understanding for both text and speech data. This is mainly due to the conventional approach of running speech-to-text and natural language understanding engines subsequently.

What is intent detection?

Intent Detection is a subtask of natural language processing and a critical component of any task-oriented system. Natural language understanding solutions match users' utterances with one of the predefined classes by understanding the user’s goal (i.e., intention). After matching utterances with intents, the software can initiate a task to achieve users’ goals. For example, users with the intention to turn the lights off may say: “Turn the lights off.”, “Switch off the lights.”, “Can you please turn the lights off?”. Intent detection captures the users’ goal: “change the state of the lights from on to off” despite the different ways to communicate it.

Can I use Rhino Speech-to-Intent to overcome the limitations of Amazon Lex and Google Dialogflow?

Rhino Speech-to-Intent is a more accurate, resource-efficient, and faster alternative to Amazon Lex, Google DialogFlow, or other NLU engines for use-case-specific intent detection. Picovoice offers a Free Plan to enable experimentation to overcome various challenges. However, if you’re still not sure how to overcome the limitations of Amazon Lex, Google DialogFlow, and other NLU engines with Rhino Speech-to-Intent or need help with migration, leverage Picovoice’s Consulting Services!

How does Rhino Speech-to-Intent differ from Natural Language Understanding (NLU) solutions such as Amazon Lex, Google DialogFlow, IBM Watson Natural Language Understanding, or Microsoft LUIS??

Rhino Speech-to-Intent -as the name suggests, converts speech into intent directly without relying on text, eliminating the need for text representation. Rhino Speech-to-Intent uses the modern end-to-end approach to infer intents and intent details directly from spoken commands. This enables developers to train jointly optimized automatic speech recognition (ASR) and natural language understanding (NLU) engines tailored to their specific domain, achieving higher accuracy.

Rhino Speech-to-Intent excels in use-case-specific applications, such as voice-enabled coffee machines or surgical robots, which involve a limited number of commands, offering high accuracy with minimal resources. In contrast, open-domain applications like voice-enabled ChatGPT handle a wide range of topics and variations. Thus, we recommend Cheetah Streaming Speech-to-Text and picoLLM for such applications.