Build a Voice AI Linux Assistant with Python

🚀 Best-in-class Voice AI!

Build desktop and server applications with on-device voice AI and LLMs.

Why Choose On-Device Voice AI for the Linux Assistant

Cloud-based voice applications introduce network latency, require constant internet connectivity and raise privacy concerns when your sensitive audio data is sent to remote servers. These limitations can be a bottleneck when building responsive, reliable, and secure applications.

In this tutorial, we will build a voice-powered Linux assistant with Python using Porcupine Wake Word and Rhino Speech-to-Intent. Choosing AI engines with edge-first architecture will make the Linux assistant responsive, private, and reliable, giving full control over the user experience.

Step 1: Create a Custom Wake Word for the Linux Assistant

The AI assistant uses a custom wake word that it can recognize. You can create this custom voice model on the Picovoice Console.

Log in to the Picovoice Console.
Go to the Porcupine Wake Word section and create a new wake word.
Enter a custom phrase, such as "Hey Computer".

For tips on choosing an effective wake word, refer to the guide for choosing a wake word.

Train the voice model and download it, choosing the Linux platform you're working on, e.g., Linux x86_64.

You will now find a folder in your downloads containing the .ppn keyword file.

Step 2: Create Voice Commands for The Linux Assistant

Once the voice assistant has a wake word, it needs a set of voice commands to understand the user's intent.

In the Picovoice Console, go to the Rhino Speech-to-Intent tab and create a new Rhino context
Click the "Import YAML" button in the top-right corner of the Console. Paste the YAML provided below to add the intents and expressions for the Linux assistant.

For detailed instructions, refer to the Rhino Syntax Cheat Sheet.

Use the microphone button to test your context in-browser. This will save and build the context.
Once built, download the file choosing the Linux platform.

You will now find a zipped folder in your downloads containing the .rhn context file.

YAML Context for the Linux Assistant:

context:
  expressions:
    set_timer:
      - "[set, start] (a) timer for $pv.TwoDigitInteger:minutes minutes"
      - timer for $pv.TwoDigitInteger:minutes minutes
    check_timer:
      - how much time is left
      - check (the) timer
    stop_timer:
      - "[stop, cancel] (the) timer"
    open_app:
      - "@go_to (the) $app_name:app_name"

  slots:
    app_name:
      - fire fox
      - calculator
      - terminal

  macros:
    go_to:
      - go to
      - take me to
      - launch
      - open

Step 3: Demo Tools and Packages for the Linux Assistant

The following tools and packages are needed for the demo:

A Linux machine with Python 3.9 or above, and a working microphone.
The Rhino Speech-to-Intent Python SDK pvrhino, the Porcupine Wake Word Python SDK pvporcupine and the Picovoice Python Recorder library pvrecorder. Install them all with a single command:

pip install pvrhino pvporcupine pvrecorder

The libnotify-bin package to enable desktop notifications with the notify-send command:

sudo apt install libnotify-bin

Step 4: Write the Python Script for the Linux Assistant

With the setup complete, we can now write the code that powers the assistant.

Detect Wake Word and Process Voice Commands in Real Time

The main loop continuously processes audio. Porcupine listens for the wake word, and once detected, Rhino takes over to recognize the user's intent.

recorder.start()
print("Say the Wake Word to start (Ctrl+C to stop)…")

try:
    listening_for_intent = False
    while True:
        pcm = recorder.read()

        if not listening_for_intent:
            # Wake word detection
            keyword_index = porcupine.process(pcm)
            if keyword_index >= 0:
                notify("Listening for command…")
                listening_for_intent = True
        else:
            # Rhino intent processing
            is_finalized = rhino.process(pcm)
            if is_finalized:
                inference = rhino.get_inference()

                if inference.is_understood:
                    intent = inference.intent
                    slots = inference.slots
                    print(f"[understood] intent={intent} slots={slots}")

                    # Handle the action
                    handle_linux_action(intent, slots, timer_manager)
                else:
                    notify("Didn't understand, please try again.")

                # Reset Rhino for next turn
                rhino.reset()
                listening_for_intent = False

Map Recognized Voice Commands to Actions for the Linux Assistant

The handle_linux_action function maps Rhino's recognized intents and their slots into Linux assistant actions. We define an app_map dictionary to translate app names from voice commands into actual executable names on the system.

app_map = {
    "calculator": "gnome-calculator",
    "terminal": "gnome-terminal",
    "fire fox": "firefox"
}

def handle_linux_action(intent, slots, timer_manager):
    """Demo: handle Linux actions from voice commands.
    You can expand this with more intents for your own workflow."""
    
    if intent == "set_timer":
        minutes = slots.get("minutes", "0")
        if minutes.isdigit():
            timer_manager.set_timer(int(minutes))
            print(f"[Demo] Timer set for {minutes} minutes")
        else:
            notify("Please say a valid number.")
            
    elif intent == "open_app":
        app_name = slots.get("app_name")
        if app_name:
            executable = app_map.get(app_name)
            if executable:
                try:
                    subprocess.Popen([executable])
                    notify(f"Opening {app_name}")
                except FileNotFoundError:
                    notify(f"App not found: {executable}")
            else:
                notify(f"'{app_name}' is not a supported application.")
        else:
            notify("Please specify an application to open.")
            
    else:
        # Other intents (check_timer, stop_timer, etc.) can go here
        # Extend this section with more custom commands
        print("[Demo] Intent not handled in this demo")

Add Notifications and Timer Control to the Linux Assistant

The notify function sends desktop notifications using the notify-send command to provide visual feedback to the user.

def notify(text: str):
    """Send desktop notification using notify-send"""
    try:
        subprocess.run(["notify-send", text], check=False)
    except Exception:
        print(f"[notify] {text}")

The TimerManager class encapsulates the timer-related functionality, managing timer state and operations.

class TimerManager:
    """Simple demo timer manager"""
    def __init__(self):
        self._thread = None
        self._active = False
        self._end_time = 0

    def set_timer(self, minutes: int):
        # Example app logic: replace with your own automation

Full Python Script for the Voice Powered Linux Assistant

Here's the complete Python script integrating all the components above into a working Linux Assistant:

# Voice AI Linux Assistant with Porcupine + Rhino
import subprocess
import threading
import time
import argparse
import pvporcupine
import pvrhino
from pvrecorder import PvRecorder

app_map = {
    "calculator": "xcalc",
    "terminal": "x-terminal-emulator",
    "fire fox": "firefox"
}

def notify(text: str):
    """Send desktop notification using notify-send"""
    try:
        subprocess.run(["notify-send", text], check=False)
    except Exception:
        print(f"[notify] {text}")


class TimerManager:
    """Manages timer functionality"""
    def __init__(self):
        self._thread = None
        self._active = False
        self._end_time = 0

    def set_timer(self, minutes: int):
        if self._thread and self._thread.is_alive():
            self.stop_timer()

        self._active = True
        self._end_time = time.time() + (minutes * 60)

        def run():
            while self._active and time.time() < self._end_time:
                time.sleep(1)
            if self._active:
                notify(f"Timer for {minutes} minutes finished!")
                self._active = False

        self._thread = threading.Thread(target=run, daemon=True)
        self._thread.start()
        notify(f"Timer set for {minutes} minutes.")

    def check_timer(self):
        if not self._active:
            notify("No timer active.")
            return
        remaining = int(self._end_time - time.time())
        minutes, seconds = divmod(max(remaining, 0), 60)
        notify(f"Time remaining: {minutes} min {seconds} sec.")

    def stop_timer(self):
        if self._active:
            self._active = False
            if self._thread and self._thread.is_alive():
                self._thread.join(timeout=1)
            notify("Timer stopped.")
        else:
            notify("No timer running.")


def handle_linux_action(intent, slots, timer_manager):
    """Handle Linux assistant actions based on intent"""
    if intent == "set_timer":
        try:
            mins = int(slots.get("minutes", "0"))
            if mins > 0:
                timer_manager.set_timer(mins)
            else:
                notify("Please specify minutes for the timer.")
        except ValueError:
            notify("Invalid minutes value.")
    elif intent == "check_timer":
        timer_manager.check_timer()
    elif intent == "stop_timer":
        timer_manager.stop_timer()
    elif intent == "open_app":
        app_name = slots.get("app_name")
        if app_name:
            executable = app_map.get(app_name)
            if executable:
                try:
                    subprocess.Popen([executable])
                    notify(f"Opening {app_name}")
                except FileNotFoundError:
                    notify(f"App not found: {executable}")
            else:
                notify(f"'{app_name}' is not a supported application.")
        else:
            notify("Please specify an application to open.")
    else:
        notify("Intent not handled.")


def main():
    # Set up command line argument parser
    parser = argparse.ArgumentParser(
        description="Voice AI Linux Assistant with Porcupine and Rhino"
    )
    parser.add_argument(
        "--access_key",
        required=True,
        help="Picovoice access key"
    )
    parser.add_argument(
        "--context_file_path",
        required=True,
        help="Path to the context file (.rhn)"
    )
    parser.add_argument(
        "--keyword_file_path",
        required=True,
        help="Path to the keyword file (.ppn)"
    )
    parser.add_argument(
        "--audio_device_index",
        type=int,
        default=-1,
        help="Audio device index (default: -1)"
    )

    args = parser.parse_args()

    # Load command line arguments
    access_key = args.access_key
    context_path = args.context_file_path
    keyword_path = args.keyword_file_path
    device_index = args.audio_device_index

    # Initialize engines
    porcupine = pvporcupine.create(
        access_key=access_key,
        keyword_paths=[keyword_path]
    )
    rhino = pvrhino.create(
        access_key=access_key,
        context_path=context_path
    )
    recorder = PvRecorder(
        device_index=device_index,
        frame_length=porcupine.frame_length
    )

    # Timer manager instance
    timer_manager = TimerManager()

    # Main loop
    print("Say the Wake Word to start (Ctrl+C to stop)…")
    recorder.start()

    try:
        listening_for_intent = False
        while True:
            pcm = recorder.read()

            if not listening_for_intent:
                # Wake word detection
                keyword_index = porcupine.process(pcm)
                if keyword_index >= 0:
                    notify("Listening for command…")
                    listening_for_intent = True
            else:
                # Rhino intent processing
                is_finalized = rhino.process(pcm)
                if is_finalized:
                    inference = rhino.get_inference()

                    if inference.is_understood:
                        intent = inference.intent
                        slots = inference.slots
                        print(
                            f"[understood] intent={intent}"
                            f"slots={slots}"
                        )

                        # Handle the action
                        handle_linux_action(
                            intent,
                            slots,
                            timer_manager
                        )
                    else:
                        notify("Didn't understand, please try again.")

                    # Reset Rhino for next turn
                    rhino.reset()
                    listening_for_intent = False

    except KeyboardInterrupt:
        print("\nStopping…")
    finally:
        recorder.stop()
        recorder.delete()
        rhino.delete()
        porcupine.delete()
        timer_manager.stop_timer()


if __name__ == "__main__":
    main()

Step 4: Run the Linux Voice Assistant

Run the Python script from your terminal with the following commands. Copy your AccessKey from the Picovoice Console and replace the placeholder values with your actual ACCESS_KEY, CONTEXT_FILE_PATH, and KEYWORD_FILE_PATH. The AUDIO_DEVICE_INDEX is optional and will use the default microphone if not specified.

python linux-assistant.py \
    --access_key="${ACCESS_KEY}" \
    --context_file_path="${CONTEXT_FILE_PATH}" \
    --keyword_file_path="${KEYWORD_FILE_PATH}" \
    --audio_device_index="${AUDIO_DEVICE_INDEX}"

How to Extend the Linux Voice Assistant with More Features

This example is just a starting point. You can build on this foundation to create a truly custom voice interface for Linux machine:

Integrate with scripts: Map Rhino intents to custom scripts to automate tasks such as backing up files or running a build.
Add spoken responses: Integrate an on-device text-to-speech engine like Orca Text-to-Speech for a full conversational voice loop.
Run it in the background: Use a Systemd user service to have your Linux assistant launch automatically on startup.

Start Building an AI-Powered Voice Assistant on Linux

Ready to create your own voice-controlled Linux assistant? The Picovoice Console is the best place to start. You can create your custom wake word and speech-to-intent context for free in just a few clicks.

Start Free