A Wake Word Engine is a tiny algorithm that detects utterances of a given Wake Phrase within a stream of audio. There are good articles that focus on how to build a Wake Word Model using TensorFlow or PyTorch. These are invaluable for educational purposes. But training a production-ready Wake Word Model requires significant effort for data curation and expertise to simulate real-world environments during training.
Picovoice Porcupine Wake Word Engine uses Transfer Learning to eliminate the need for data collection per model. Porcupine enables you to train custom wake words instantly without requiring you to gather any data.
Below we learn how to use Porcupine Python SDK for Wake Word Detection and train production-ready Custom Wake Words within seconds using Picovoice Console.
Install Porcupine Python SDK
Install the SDK using PIP from a terminal:
Sign up for Picovoice Console
Sign up for Picovoice Console for free and copy your AccessKey
. It handles authentication and authorization.
Usage
Porcupine SDK ships with a few built-in Wake Word Models such as Alexa
, Hey Google
, OK Google
, Hey Siri
, and Jarvis
. Check the list of builtin models:
Builtin Keyword Models
Initialization
When initializing Porcupine, you can use one of the built-in Wake Word Models:
Or you can provide Custom Keyword Models (more on this below):
Processing
Porcupine takes in audio in chunks (frames). .frame_length
property gives the size of each frame. Porcupine accepts 16 kHz audio with 16-bit samples. For each frame, Porcupine returns a number representing the detected keyword. -1
indicates no detection. Positive indices correspond to keyword detections.
Cleanup
When done, be sure to release resources:
Create Custom Wake Words
Often you want to use Custom Wake Word Models with your project. Branded Wake Word Models are essential for enterprise products. Otherwise, you are pushing Amazon, Google, and Apple's brand, not yours! You can create Custom Wake Word Models using Picovoice Console in seconds:
- Log in to Picovoice Console
- Go to the Porcupine Page
- Select the target language (e.g. English, Japanese, Spanish, etc.)
- Select the platform you want to optimize the model for (e.g. Raspberry Pi, macOS, Windows, etc.)
- Type in the wake phrase. A good wake phrase should have a few linguistic properties.
- Click the train button. Your model with be ready momentarily (file with
.ppn
suffix). You can download this file for on-device inference.
A Working Example
All that is left is to wire up the audio recording. Then we have an end-to-end wake word solution. Install PvRecorder Python SDK using PIP:
The following code snippet records audio from the default microphone on the device and processes recorded audio using Porcupine to detect the utterances of selected keywords. Altogether, we need less than 20 lines of code!