Rhino Speech-to-Intent Engine FAQ

How many commands (expressions) can Rhino Speech-to-Intent engine understand?

There is no technical limit on the number of commands (expressions) or slot values Rhino can understand. However, on platforms with limited memory (MCUs or DSPs), the total number of commands and vocabulary will be dictated by the available amount of memory (FLASH). Roughly speaking, for 100 commands and unique words, you should allocate around 50 KB of additional memory.

What is the accuracy of Rhino?

Picovoice has done rigorous performance benchmarking on its Rhino Speech-to-Intent engine and published the results publicly in the Picovoice docs. Also, the audio data, code, and models used for benchmarking have been made publicly available under the Apache 2.0 to facilitate reproducibility.

Rhino Speech-to-Intent engine can extract intents from spoken utterances with higher than 99% accuracy in clean (no noise) environments, and 97% accuracy in noisy environments with the signal to noise ratio of 9dB at microphone level.

Can Rhino understand phone numbers, time of day, dates, alphanumerics, etc?

Yes, Rhino can accurately understand numbers, alphanumerics, and similar challenging parameters. Watch this demo of a phone dialling interaction running on an ARM Cortex-M4 microcontroller simulating a hearable application.

Which platforms does Rhino support?

Linux (x86_64)
macOS (x86_64, arm64)
Windows (x86_64)
Arm Cortex-A
Arm Cortex-M
BeagleBone
NVIDIA Jetson
Raspberry Pi (all variants)
Android
iOS
Modern Web Browsers (Chrome, Safari, Firefox, Opera)

Does Picovoice Speech-to-Intent software work in my target environment and noise conditions?

The overall performance depends on various factors such as speaker distance, level/type of noise, room acoustics, quality of the microphone, and audio frontend algorithms used (if any). It is usually best to try out our technology in your target environment using freely available sample models. Additionally, we have published an open-source benchmark of our Speech-to-Intent software in a noisy environment, which can be used as a reference.

Does Picovoice Speech-to-Intent software work in presence of noise and reverberation?

Yes, Rhino is resilient to noise, reverberation, and other acoustic artifacts. We have done rigorous performance benchmarking on Rhino and published the results publicly. Also, the audio data and the code used for benchmarking have been made publicly available under the Apache 2.0 license to reproduce the results.

Is there a limit on the number of slot values?

There is no technical limit on the number of slot values. However, on platforms with limited memory (particularly MCUs), the total number will be dictated by the available amount of memory. Roughly speaking, for every 100 unique words/phrases, you should allocate around 50 KB of additional memory.

I need to use Speech-to-Intent software in an Interactive Voice Response (IVR) application. Is that possible?

Yes, Rhino is a powerful tool for building IVR applications. However, please note that Picovoice software only works well on 16kHz audio and does not perform optimally in telephony applications that use 8kHz audio.

Does Picovoice Speech-to-Intent engine perform end-pointing?

Yes, it performs end-pointing automatically, also you can set endpoint duration manually. Check out the API of your choice to learn how to do it.

What’s sensitivity value? What should I set the sensitivity value to?

You should pick a sensitivity parameter that suits your application's requirements. A higher sensitivity value gives a lower miss rate at the expense of a higher false alarm rate.

Does my application need to listen to a wake word before processing the audio with Rhino?

Speech-to-Intent software requires a method of initiation to start listening when the user is about to speak. That could be implemented by either push-to-talk switch or by the Picovoice Porcupine wake word detection engine, depending on the customer requirement.

What’s the advantage of using Picovoice Speech-to-Intent software instead of using speech-to-text and input the transcribed text into a natural language understanding (NLU) engine to extract intents?

Using a generic speech-to-text engine with NLU usually results in suboptimal accuracy without any tuning. We have benchmarked the performance of Picovoice Rhino against several alternatives including Google Dialogflow, Amazon Lex, IBM Watson, and Microsoft LUIS.

Which languages does Rhino Speech-to-Intent support?

Rhino supports Arabic, Dutch, English, Farsi, French, German, Hindi, Italian, Japanese, Korean, Mandarin, Polish, Portuguese, Russian, Spanish, Swedish, and Vietnamese.

Are you planning to release new languages for Rhino Speech-to-Intent support?

Yes, very soon! If you have a commercial project, please reach out to Picovoice Consulting with details of your project.

Was this doc helpful?

Issue with this doc?