Speech-to-Intent Engine - Raspberry Pi Quick Start

  • Speech to Intent
  • NLU
  • Voice Recognition
  • Speech Recognition
  • Voice Commands
  • Raspberry Pi
  • Python
  • C

Requirements

  • Raspberry Pi (4, 3, 2, or Zero) running Raspbian.
  • USB microphone.

Microphone Setup

Identify The Microphone's Name

Connect the microphone to Raspberry Pi and get the list of available audio devices

arecord -L

The output will be similar to below

null
    Discard all samples (playback) or generate zero samples (capture)
default
mic
sysdefault:CARD=Device
    USB PnP Sound Device, USB Audio
    Default Audio Device
front:CARD=Device,DEV=0
    USB PnP Sound Device, USB Audio
    Front speakers
surround21:CARD=Device,DEV=0
    USB PnP Sound Device, USB Audio
    2.1 Surround output to Front and Subwoofer speakers
surround40:CARD=Device,DEV=0
    USB PnP Sound Device, USB Audio
    4.0 Surround output to Front and Rear speakers
surround41:CARD=Device,DEV=0
    USB PnP Sound Device, USB Audio
    4.1 Surround output to Front, Rear and Subwoofer speakers
surround50:CARD=Device,DEV=0
    USB PnP Sound Device, USB Audio
    5.0 Surround output to Front, Center and Rear speakers
surround51:CARD=Device,DEV=0
    USB PnP Sound Device, USB Audio
    5.1 Surround output to Front, Center, Rear and Subwoofer speakers
surround71:CARD=Device,DEV=0
    USB PnP Sound Device, USB Audio
    7.1 Surround output to Front, Center, Side, Rear and Woofer speakers
iec958:CARD=Device,DEV=0
    USB PnP Sound Device, USB Audio
    IEC958 (S/PDIF) Digital Audio Output
dmix:CARD=Device,DEV=0
    USB PnP Sound Device, USB Audio
    Direct sample mixing device
dsnoop:CARD=Device,DEV=0
    USB PnP Sound Device, USB Audio
    Direct sample snooping device
hw:CARD=Device,DEV=0
    USB PnP Sound Device, USB Audio
    Direct hardware device without any conversions
plughw:CARD=Device,DEV=0
    USB PnP Sound Device, USB Audio
    Hardware device with all software conversions

In this case we pick plughw:CARD=Device,DEV=0. Note that this device comes with software conversions which is handy for resampling. In what follows we note this value as ${INPUT_AUDIO_DEVICE}.

Set The Default Microphone

create ~/.asoundrc

pcm.!default {
type asym
capture.pcm "mic"
}
pcm.mic {
type plug
slave {
pcm ${INPUT_AUDIO_DEVICE}
}
}

If you have a speaker add a section for that to ~/asoundrc as well.

Test the Microphone

Check the microphone works properly by recording audio into a file

arecord --format=S16_LE --duration=5 --rate=16000 --file-type=wav ~/test.wav

If the command above executes without any errors, then the microphone is functioning as expected. In practice, it is recommended to inspect the recorded file for recording side effects such as clipping.

Installation

The core of the Speech-to-Intent engine is shipped as a pre-compiled ANSI C library. Hence, it can be used within a C/C++ application directory or in a high-level language such as Python via its bindings.

Python

Clone the repository using

git clone --recursive https://github.com/Picovoice/rhino.git

Change the current directory to the root of the repository and install Python dependencies

pip3 install -r requirements.txt

Install PyAudio using

sudo apt-get install python3-pyaudio

Test the validity of installation by running Python binding's unit tests

python3 binding/python/test_rhino.py

Finally, run the microphone demo application. It opens an input audio stream, monitors it using Picovoice wake word detection engine, and when the wake phrase ("Picovoice") is detected it will extract the intent within the follow-up spoken command using Speech-to-Intent engine.

python3 demo/python/rhino_demo_mic.py --rhino_context_file_path \
./resources/contexts/raspberry-pi/smart_lighting_raspberry-pi.rhn

Now you can say something like "Picovoice, turn on the lights in the kitchen" and it outputs the result of inference into terminal

detected wake phrase
intent: turnLight
---
state: on
location: kitchen

C

Install ALSA development library

sudo apt-get install libasound-dev

Clone the repository using

git clone --recursive https://github.com/Picovoice/rhino.git

Change the current directory to the root of the repository and compile the C demo application.

gcc -O3 -o demo/c/rhino_demo_mic -I include -I resources/porcupine/include/ demo/c/rhino_demo_mic.c \
-ldl -lasound -std=c99

Then run the demo. It opens an input audio stream, monitors it using Picovoice wake word detection engine, and when the wake phrase ("Picovoice") is detected it will extract the intent within the follow-up spoken command using Speech-to-Intent engine. Replace ${CPU} in the command below based on the trim of Raspberry Pi (cortex-a72 for Raspberry Pi 4, cortex-a53 for Raspberry Pi3, cortex-a7 for Raspberry Pi 2, and arm11 for the rest) and run the demo

demo/c/rhino_demo_mic \
lib/raspberry-pi/${CPU}/libpv_rhino.so \
lib/common/rhino_params.pv \
resources/contexts/raspberry-pi/smart_lighting_raspberry-pi.rhn \
resources/porcupine/lib/raspberry-pi/${CPU}/libpv_porcupine.so \
resources/porcupine/lib/common/porcupine_params.pv \
resources/porcupine/resources/keyword_files/raspberry-pi/picovoice_raspberry-pi.ppn \
${INPUT_AUDIO_DEVICE}

Now you can say something like "Picovoice, turn on the lights in the kitchen" and it outputs the result of inference into terminal

detected wake phrase
intent: turnLight
---
state: on
location: kitchen

Creating Custom Models

Enterprises who are commercially engaged with Picovoice can create custom NLU models using Picovoice Console.