Rhino Speech-to-Intent Engine FAQ
How many commands (expressions) can Rhino Speech-to-Intent engine understand?
There is no technical limit on the number of commands (expressions) or slot values Rhino can understand. However, on platforms with limited memory (MCUs or DSPs), the total number of commands and vocabulary will be dictated by the available amount of memory (FLASH). Roughly speaking, for 100 commands and unique words, you should allocate around 50 KB of additional memory.
What is the accuracy of Rhino?
Picovoice has done rigorous performance benchmarking on its Rhino Speech-to-Intent engine and published the results publicly here. Also, the audio data, code, and models used for benchmarking have been made publicly available under the Apache 2.0 to facilitate reproducibility.
Rhino Speech-to-Intent engine can extract intents from spoken utterances with higher than 97% accuracy in clean (no noise) environments, and 95% accuracy in noisy environments with the signal to noise ratio of 9dB at microphone level.
Can Rhino understand phone numbers, time of day, dates, alphanumerics, etc?
Yes, Rhino can accurately understand numbers, alphanumerics, and similar challenging parameters. Here is a demo of phone dialing interaction running on ARM Cortex-M4 microcontroller simulating a hearable application.
Which platforms does Rhino support?
- ARM Cortex-M
- ARM Cortex-A
- Raspberry Pi (all variants)
- Linux (x86_64)
- macOS (x86_64)
- Windows (x86_64)
- Modern Web Browsers
Does Picovoice Speech-to-Intent software work in my target environment and noise conditions?
The overall performance depends on various factors such as speaker distance, level/type of noise, room acoustics, quality of the microphone, and audio frontend algorithms used (if any). It is usually best to try out our technology in your target environment using freely available sample models. Additionally, we have published an open-source benchmark of our Speech-to-Intent software in a noisy environment here, which can be used as a reference.
Does Picovoice Speech-to-Intent software work in presence of noise and reverberation?
Yes, Rhino is resilient to noise, reverberation, and other acoustic artifacts. We have done rigorous performance benchmarking on Rhino and published the results publicly here. Also, the audio data and the code used for benchmarking have been made publicly available under the Apache 2.0 license to reproduce the results.
Is there a limit on the number of slot values?
There is no technical limit on the number of slot values. However, on platforms with limited memory (particularly MCUs), the total number will be dictated by the available amount of memory. Roughly speaking, for every 100 unique words/phrases, you should allocate around 50 KB of additional memory.
I need to use Speech-to-Intent software in an Interactive Voice Response (IVR) application. Is that possible?
Yes, Rhino is a powerful tool for building IVR applications. However, please note that Picovoice software only works well on 16kHz audio and does not perform optimally in telephony applications that use 8kHz audio.
Does Picovoice Speech-to-Intent engine perform endpointing?
Yes, it performs endpointing automatically.
Does my application need to listen to a wake word before processing the audio with Rhino?
Speech-to-Intent software requires a method of initiation to start listening when the user is about to speak. That could be implemented by either push-to-talk switch or by the Picovoice Porcupine wake word detection engine, depending on the customer requirement.
What’s the advantage of using Picovoice Speech-to-Intent software instead of using speech-to-text and input the transcribed text into a natural language understanding (NLU) engine to extract intents?
Using a generic speech-to-text engine with NLU usually results in suboptimal accuracy without any tuning. We have benchmarked the performance of Picovoice Rhino against several alternatives including Google Dialogflow, Amazon Lex, IBM Watson, and Microsoft LUIS here.