Choose the best engine based on data! Accuracy depends on various factors. In a market with numerous “the best engine”, we published an open-source benchmark. Compare Rhino against the most popular conversational AI engines, Amazon Lex,Google Dialogflow, IBM Watson, Microsoft LUIS, or any other Natural Language Understanding (NLU) engine. Rhino outperforms them across various accents and in the presence of noise and reverberations.
Build real real-time experiences with Rhino. Rhino’s edge-first architecture infers intents from utterances directly with zero latency. Relying on the cloud APIs hinders user experience due to fluctuating latency or network performance. Milliseconds matter in many applications such as automotive, smart TV or metaverse.
Ensure user privacy and stay compliant! Rhino processes voice commands locally on-device, without recording data and sending them to the cloud. Put Rhino in meeting rooms, warehouses or examination rooms, knowing that no one will ever have access to the conversations.
Create polyglot experiences with Rhino Speech-to-Intent! Grow globally and train voice AI models in English, French, German, Italian, Japanese, Korean, Portuguese, Spanish, and more on the Picovoice Console. Every user still has access to unlimited voice interactions in all languages.
NLU engines infer intents and slots (entities) from speech transcribed by a speech-to-text engine. Rhino Speech-to-Intent understands the intention directly from the spoken utterance. We coined the term Speech-to-Intent when developing Rhino to indicate the end-to-end nature of its inference.
The standard approach to intent inference (i.e. understanding voice commands) is to break it down into two tasks. First, a speech-to-text engine converts the spoken utterance into text. Then the transcription is processed by a natural language understanding (NLU) engine. The NLU engine is responsible for inferring the topic, intent, and slots. However, if the accuracy of the speech-to-text engine is not good, the output of NLU will be poor, too. Therefore, some solutions tune speech-to-text engines for the domain of interest to improve overall performance. This approach requires significant resources such as computing power, memory, and storage. When implemented as a cloud solution, this is not an issue. However, the cloud is not always the best option. Also, not every use case requires open-domain, millions of variants of spoken comments. One does not need to discuss the meaning of life with a coffee machine or a surgical robot. Most use cases have a confined domain (context) that covers thousands of spoken commands.
Picovoice’s Speech-to-Intent engine is perfect for these use cases by fusing automated speech recognition and NLU engines tuned for the specific domain of interest. This end-to-end approach results in small and efficient model sizes with high accuracy.
Intents, expressions, and slots are commonly used in conversational AI and across various engines such as Amazon Lex, IBM Watson, Google Dialogflow or Rasa NLU. They’re used to build voice assistants or bots. You can check out Picovoice Glossary to learn more or Rhino Syntax Cheat Sheet to start building contexts with intents, slots, macros and expressions.
Picovoice docs is a great source to learn how to add custom voice commands to Android and iOS applications and modern web browsers.
Rhino processes voice data locally on the device. If you haven’t, try the voice-activated coffee maker demo offline. After allowing the microphone access, turn off your internet connection before running the demo. Rhino Speech-to-Intent directly infers intents from your utterances within your web browser.
Reach out to Picovoice Sales by providing details about the opportunity, including use case, requirements and project details.