Rhino Speech-to-Intent Engine FAQ
Intent recognition, also known as intent classification or intent detection, is the process of analyzing a user's written or spoken input to determine their underlying goal or purpose. By identifying the intent behind utterances, systems like AI agents can respond effectively and interact with humans. Intent recognition is crucial in applications such as customer service, sales automation, menu navigation, or smart devices, enabling streamlined interactions and enhancing user experience.
Rhino Speech-to-Intent infers intents and slots from utterances which can be later used to trigger an action.
Below is an example output of Rhino Speech-to-Intent. Developers can use this information to pour “Large skim milk americano with one pump of sugar” if they're building a smart coffee machine or send the order to a barista if they're building a phone ordering software.
You can learn more about Picovoice's approach to End-to-End Intent Inference from Speech.
There is no technical limit on the number of commands (expressions) or slot values Rhino can understand. However, on platforms with limited memory (MCUs or DSPs), the total number of commands and vocabulary will be dictated by the available amount of memory (FLASH). You can talk to Picovoice Engineering to discuss your use case requirements and hardware limitations.
Picovoice has done rigorous performance benchmarking on its Rhino Speech-to-Intent engine and open-sourced and published the results of Natural Language Understanding Benchmark results comparing Amazon Lex, Google DialogFlow, Microsoft Luis, IBM Watson, and Picovoice Rhino to help enterprises choose the best natural language understanding engine. The audio data, code, and models used for benchmarking have been made publicly available under Apache 2.0 to facilitate reproducibility.
Rhino Speech-to-Intent engine can extract intents from spoken utterances with higher than 99% accuracy in clean (no noise) environments, and 97% accuracy in noisy environments with the signal-to-noise ratio of 9dB at microphone level.
Yes, Rhino can accurately understand numbers, alphanumerics, and similar challenging parameters. Watch this demo of a phone dialing interaction running on an ARM Cortex-M4 microcontroller simulating a hearable application.
The overall performance depends on various factors such as speaker distance, level/type of noise, room acoustics, quality of the microphone, and audio frontend algorithms used (if any). It is usually best to try out our technology in your target environment using freely available sample models. Additionally, we have published an open-source benchmark of our Speech-to-Intent software in a noisy environment, which can be used as a reference.
Yes, Rhino is resilient to noise, reverberation, and other acoustic artifacts. We have done rigorous performance benchmarking on Rhino and published the results publicly. Also, the audio data and the code used for benchmarking have been made publicly available under the Apache 2.0 license to reproduce the results.
Yes, Rhino is a powerful tool for building IVR applications. However, please note that Picovoice software only works well on 16kHz audio and does not perform optimally in telephony applications that use 8kHz audio.
Yes, it performs end-pointing automatically, also you can set endpoint duration manually. Check out the API of your choice to learn how to do it.
Sensitivity value shows how well a test can identify true positives. A higher sensitivity value gives a lower miss rate at the expense of a higher false alarm rate. You should pick a sensitivity parameter that suits your application's requirements.
Besides using Porcupine Wake Word, you can implement physical or digital buttons, e.g., touch-to-talk, or a push-to-talk switch and use Cobra Voice Activity Detection, depending on your requirements.
Using a generic speech-to-text engine with NLU usually results in suboptimal accuracy without any tuning. Introduction to Spoken Language Understanding discusses these two approaches in detail. We have also benchmarked natural language understanding engines to compare the performance of Picovoice Rhino with Google Dialogflow, Amazon Lex, IBM Watson, and Microsoft LUIS.