🎯 Voice AI Consulting
Get dedicated support and consultation to ensure your specific needs are met.
Consult an AI Expert

Most developers reach for automatic speech recognition (ASR), also known as speech-to-text (STT) engines when they actually need Speech-to-Intent—a better alternative for enabling custom voice control in domain-specific applications. Speech-to-Intent (also called intent recognition or command recognition) extracts structured meaning from voice commands. Instead of transcribing 'Turn on the bedroom lights' to text, it returns:

Unlike STT engines (like Google Speech API or Whisper) that transcribe everything a user says, or natural language understanding engines (like Google Dialogflow and Amazon Lex) initially built for text-based chatbots, Rhino Speech-to-Intent directly maps spoken voice commands to actions without the overhead of full transcription.

Rhino Speech-to-Intent also processes voice commands locally without sending audio externally. This approach delivers better privacy, reliability, and 6x higher accuracy than cloud alternatives.

Problem: Traditional STT sends audio to the cloud, transcribes full sentences, then parses with NLP—adding latency, privacy concerns, and complexity.

Solution: Rhino Speech-to-Intent processes commands directly on-device, extracting structured intents without cloud dependencies.

When to Use Speech-to-Intent vs. Full Transcription:

  • Use Rhino when: You have predefined commands, need offline operation, require low latency, or prioritize privacy
  • Use STT when: You need open-ended transcription, chatbot integration, or dictation features

If you need speech-to-text in your Node.js application instead of Speech-to-Intent, refer to our guide Real-time Transcription in Node.js.

This tutorial shows you how to build a voice-activated trigger system using Node.js that runs across Windows, macOS, Linux, and Raspberry Pi; perfect for smart home automation, IoT device control, industrial equipment, accessibility tools, automotive interfaces, and voice-controlled enterprise applications.

Step-by-Step: Voice Control in Node.js

Prerequisites

  1. Download Node.js (v18 or newer)
  2. Sign up for a Picovoice Console account and copy your AccessKey
  3. Train a custom context model on the Picovoice Console and download the model file (.rhn)
  4. Check that you have a working microphone or audio input

For additional guidance on how to train a custom model, check out Creating a Custom Context with Rhino or watch Picovoice Console Tutorial: Rhino Speech-to-Intent on YouTube.

1. Install Packages

Install the Rhino Speech-to-Intent Node.js SDK and the PvRecorder Node.js SDK:

2. Initialize the Voice Command Engine

Create an instance of Rhino, passing in your AccessKey and custom context model.

3. Set Up Audio Capture for Intent Detection

Begin capturing audio with PvRecorder to prepare for intent detection:

4. Detect & Map Intents to Actions

Stream audio frames to Rhino and handle recognized inferences. When Rhino detects a possible command, it returns a RhinoInference object.

5. Clean Up Resources

When done, stop the recorder and release resources to free memory:

Complete Demo: Voice Commands in Node.js

The following complete example combines all previous steps into a functional Node.js script that continuously listens for commands and logs detections to the console.

This demo uses the following packages:

For a more detailed guide, refer to the documentation:

For a complete demo application, check out the Rhino Speech-to-Intent Node.js Demo on GitHub.

Troubleshooting: Common Issues

Microphone Not Detected or Audio Input Fails

  • Check device permissions: Ensure your app has access to the system microphone.
  • Verify sampling rate: Rhino Speech-to-Intent expects 16 kHz, 16-bit mono PCM input; mismatched formats will cause errors.

No Intent Detected

  • Make sure your .rhn context file matches the phrases being spoken.
  • Test with clear pronunciation and limit background noise.

Enhance Your Enterprise Voice Solution

Start Building