Rhino Speech-to-Intent

Add custom voice commands to any software with zero latency.

Natural Language Understanding engine fused with speech-to-text, beating cloud API accuracy

Loading…

What is Rhino Speech-to-Intent?

Rhino Speech-to-Intent infers user intents from utterances, allowing users to interact with applications via voice.

Rhino Speech-to-Intent understands complex voice commands, such as “find the maintenance checklist for Boeing 707” or “call 987 655 4433”.

Build useful voice assistants that run anywhere

o = pvrhino.create(
access_key,
context_path)
while not o.process(audio()):
pass
inference = o.get_inference()
Build with Python
let o = new Rhino(
accessKey,
contextPath);
while (!o.process(audio())) { }
let inference = o.getInference();
Build with NodeJS
RhinoManagerCallback
callback =
new RhinoManagerCallback() {
@Override
public void invoke(
RhinoInference inference) {
// Inference callback
}
}
RhinoManager o =
new RhinoManager.Builder()
.setAccessKey(accessKey)
.setContextPath(contextPath)
.build(
appContext,
callback);
o.start()
Build with Android
let o = RhinoManager(
accessKey: accessKey,
contextPath: contextPath,
onInferenceCallback:
{ inference in
// Inference callback
}
);
try o.start()
Build with iOS
const {
inference,
contextInfo,
isLoaded,
isListening,
error,
init,
process,
release,
} = useRhino();
useEffect(() => {
// Inference callback
}, [inference]);
await init(
accessKey,
context,
model
);
await process();
Build with React
RhinoManager o =
await RhinoManager.create(
accessKey,
contextPath,
(inference) => {
// Inference callback
});
await o.process()
Build with Flutter
let o =
await RhinoManager.create(
accessKey,
contextPath,
(inference) => {
// Inference callback
});
await o.process()
Build with React Native
RhinoManager o =
RhinoManager.Create(
accessKey,
contextPath,
(inference) => {
// Inference callback
});
o.Start();
Build with Unity
constructor(
private o: RhinoService) {
this.inferenceDetection =
o.inference$.subscribe(
inference => {
// Inference callback
}
)
}
async ngOnInit() {
await this.o.init(
accessKey,
context,
model
)
}
Build with Angular
{
data() {
const {
state,
init,
process,
release
} = useRhino();
init(
accessKey,
context,
model,
);
return {
s: state,
process,
release
}
},
watch: {
"s.inference":
function(o) {
if (o !== null) {
// Inference callback
}
}
}
}
Build with Vue
Rhino o = Rhino.Create(
accessKey,
contextPath);
while (o.Process(AudioFrame())) {
}
Inference inference =
o.GetInference();
Build with .NET
Rhino o = new Rhino.Builder()
.setAccessKey(accessKey)
.setContextPath(contextPath)
.build();
while (!o.process(audioFrame())) {
}
RhinoInference inference =
o.getInference();
Build with Java
o := NewRhino(
accessKey,
contextPath)
err := o.Init()
for {
isFinalized, err :=
o.Process(AudioFrame())
if isFinalized {
break
}
}
inference, err :=
o.GetInference()
Build with Go
let o: Rhino =
RhinoBuilder::new(
access_key,
context_path
)
.init()
.expect("");
loop {
if let Ok(is_finalized) =
o.process(&audio_frame()) {
if is_finalized {
if let Ok(inference) =
o.get_inference() {
// Inference callback
}
}
}
}
Build with Rust
pv_rhino_init(
access_key,
model_path,
context_path,
sensitivity,
require_endpoint,
&rhino);
while (true) {
pv_rhino_process(
rhino,
audio_frame(),
&is_finalized);
if (is_finalized) {
pv_rhino_get_intent(
rhino,
&intent,
&num_slots,
&slots,
&values);
}
}
Build with C

Why Rhino Speech-to-Intent?

Cloud-dependent conventional methods use generic automatic speech recognition (ASR) and natural language understanding (NLU) engines, resulting in subpar accuracy and unreliable response time.

Rhino Speech-to-Intent, fusing ASR and NLU engines, is six times more accurate than Big Tech NLU APIs, enabling elevated user experience and productivity.

Use-case-specific voice commands in real-time with high accuracy

Improve productivity with custom voice commands that actually work

Rhino Speech-to-Intent

  • 🚀
    97%+ accuracy
  • Guaranteed response time
  • 🔒
    Private by design
  • 🤸
    Platform-agnostic

Cloud ASR & NLU APIs

  • 👍
    84% accuracy on average
  • 🐢
    Unpredictable response time
  • 👂
    3rd party data sharing
  • ☁️
    Cloud-dependent
97%+ accuracy

Six times more accurate than cloud providers

Choose the best solution based on data. The open-source natural language understanding benchmark shows that Rhino Speech-to-Intent outperforms cloud conversational AI engines across various accents and in the presence of noise and reverberation.

Guaranteed response time

Real-time - no network delay, no downtime

Build “real” real-time experiences with Rhino Speech-to-Intent. Processing voice commands in the cloud hinders user experience due to fluctuating latency or network performance. Rhino Speech-to-Intent does not send voice commands to a 3rd party cloud and processes them directly on-device.

Loading…
Privacy by design

Private — CCPA, GDPR, and HIPAA-compliant voice commands

Ensure user privacy and stay compliant! Rhino Speech-to-Intent processes voice commands locally on the device without recording data and sending them to the cloud. Enterprises can confidently put Rhino Speech-to-Intent in meeting rooms, warehouses, examination rooms, or call centers.

Platform-agnostic

Cross Platform - unified experiences anywhere!

Process voice data on all platforms and offer seamless user experiences. Rhino Speech-to-Intent runs across platforms, including microcontrollers, embedded, mobile, web, on-premise, and cloud.

Get started with

Rhino Speech-to-Intent

The best way to see how Rhino Speech-to-Intent differs from other natural language understanding solutions is to try it!

Start Now
Forever Free Plan
  • Custom Voice Commands
  • Platform-optimized model training
  • Intuitive SDKs
  • Unlimited interactions per user
  • Arabic, Dutch, English, Farsi, French, German, Hindi, Italian, Japanese, Korean, Mandarin, Polish, Portuguese, Russian, Spanish, Swedish, and Vietnamese
Learn more about

Rhino Speech-to-Intent

What’s a Natural Language Understanding (NLU) Engine?

Natural language understanding engines comprehend users’ intent. Initial studies started in the 1960s and focused on understanding the text. Understanding speech is a relatively new field. While spoken language understanding is a more specific term to refer to it, many people, including the industry and researchers, still use natural language understanding for capturing intents from utterances, mainly because the conventional approach is to run speech-to-text and natural language understanding engines subsequently.

How does cloud-based natural language understanding process voice commands?

The cloud-based conventional approach to processing voice commands breaks spoken command understanding into two tasks. First, Automatic Speech Recognition (ASR) converts voice commands into text. Then, Natural Language Understanding (NLU) processes transcribed text to detect intent and intent details. The accuracy relies on the independently trained ASR and NLU modules. NLU cannot produce correct outputs based on erroneous transcripts of voice commands. Thus some voice AI vendors fine-tune ASR for the domain of interest to improve accuracy. This approach requires significant computing, memory, and storage resources. It is not a show-stopper when implemented as a cloud solution. However, the cloud is not always the best option.

How does Rhino Speech-to-Intent differ from Natural Language Understanding (NLU) solutions?

Rhino Speech-to-Intent -as the name suggests, converts speech into intent directly without relying on text, obviating the need for automatic speech recognition. Rhino Speech-to-Intent uses the modern end-to-end approach to infer intents and intent details directly from the spoken commands, enabling developers to train jointly-optimized automatic speech recognition (ASR) and natural language understanding (NLU) engines for their domain of interest. Rhino Speech-to-Intent specializes in use-case-specific applications, not open-domain applications with billion of spoken command variations. For example, one does not need to discuss the meaning of life with a coffee machine or a surgical robot. Most use cases have a confined domain (context) that covers hundreds or thousands of spoken commands. With use-case-specific and platform-optimized voice AI models, Rhino Speech-to-Intent offers high accuracy with minimal resources.