Python Wake Word Detection Tutorial

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI apps using Python without sending user data to 3rd party servers.

A Wake Word Engine is a tiny algorithm that detects utterances of a given Wake Phrase within a stream of audio. There are good articles that focus on how to build a Wake Word Model using TensorFlow or PyTorch. These are invaluable for educational purposes. But training a production-ready Wake Word Model requires significant effort for data curation and expertise to simulate real-world environments during training.

Picovoice Porcupine Wake Word Engine uses Transfer Learning to eliminate the need for data collection per model. Porcupine enables you to train custom wake words instantly without requiring you to gather any data.

Porcupine Python SDK runs on Linux (x86_64), macOS (x86_64 / arm64), Windows (amd64), Raspberry Pi (Zero, 2, 3, 4), NVIDIA Jetson Nano, and BeagleBone.

Below we learn how to use Porcupine Python SDK for Wake Word Detection and train production-ready Custom Wake Words within seconds using Picovoice Console.

Porcupine also can run on modern Web browsers using its JavaScript SDK and on several Arm Cortex-M microcontrollers using its C SDK.

Install Porcupine Python SDK

Install the SDK using PIP from a terminal:

pip3 install pvporcupine

Usage

Porcupine SDK ships with a few built-in Wake Word Models such as Alexa, Hey Google, OK Google, Hey Siri, and Jarvis. Check the list of builtin models:

Builtin Keyword Models

import pvporcupine

for keyword in pvporcupine.KEYWORDS:
    print(keyword)

Initialization

When initializing Porcupine, you can use one of the built-in Wake Word Models:

porcupine = pvporcupine.create(
        access_key=access_key,
        keywords=[keyword_one, keyword_two])

Or you can provide Custom Keyword Models (more on this below):

porcupine = pvporcupine.create(
        access_key=access_key,
        keyword_paths=[keyword_paths_one, keyword_path_two])

Processing

Porcupine takes in audio in chunks (frames). .frame_length property gives the size of each frame. Porcupine accepts 16 kHz audio with 16-bit samples. For each frame, Porcupine returns a number representing the detected keyword. -1 indicates no detection. Positive indices correspond to keyword detections.

keyword_index = porcupine.process(audio_frame)
if keyword_index >= 0:
    # Logic to handle keyword detection events

Cleanup

When done, be sure to release resources:

porcupine.delete()

Create Custom Wake Words

Often you want to use Custom Wake Word Models with your project. Branded Wake Word Models are essential for enterprise products. Otherwise, you are pushing Amazon, Google, and Apple's brand, not yours! You can create Custom Wake Word Models using Picovoice Console in seconds:

Log in to Picovoice Console
Go to the Porcupine Page
Select the target language (e.g. English, Japanese, Spanish, etc.)
Select the platform you want to optimize the model for (e.g. Raspberry Pi, macOS, Windows, etc.)
Type in the wake phrase. A good wake phrase should have a few linguistic properties.
Click the train button. Your model with be ready momentarily (file with .ppn suffix). You can download this file for on-device inference.

A Working Example

All that is left is to wire up the audio recording. Then we have an end-to-end wake word solution. Install PvRecorder Python SDK using PIP:

pip3 install pvrecorder

The following code snippet records audio from the default microphone on the device and processes recorded audio using Porcupine to detect the utterances of selected keywords. Altogether, we need less than 20 lines of code!

import pvporcupine
from pvrecorder import PvRecorder

porcupine = pvporcupine.create(access_key=access_key, keywords=keywords)
recoder = PvRecorder(device_index=-1, frame_length=porcupine.frame_length)

try:
    recoder.start()

    while True:
        keyword_index = porcupine.process(recoder.read())
        if keyword_index >= 0:
            print(f"Detected {keywords[keyword_index]}")

except KeyboardInterrupt:
    recoder.stop()
finally:
    porcupine.delete()
    recoder.delete()