TLDR: Build a hands-free factory voice agent using Python and on-device AI. Enables equipment control and production queries through wake word detection, speech-to-intent, and local LLM processing. All processing stays on-premises for privacy compliance.
Hands-free voice control can streamline factory operations, allowing workers to manage equipment, query production data, and report maintenance issues without interrupting their tasks. This Smart Factory Voice Agent in Python runs entirely on-device, combining wake word detection, speech-to-intent, streaming speech-to-text, text-to-speech and local LLM processing to provide instant equipment control and real-time production insights, all while keeping data on-premises for full privacy compliance. In this tutorial, you’ll learn how to build a fully operational factory voice assistant using Python and Picovoice’s on-device AI stack.
What You'll Build:
A factory voice agent that:
- Activates using two custom wake phrases - one for equipment commands (e.g., "Hey Factory") and one for production queries (e.g., "Hey Assistant")
- Controls equipment instantly without manual intervention
- Queries real-time production data (output rates, machine status, inventory levels)
- Handles maintenance requests and troubleshooting information
The voice agent’s fully on-device architecture ensures that it:
- Achieves high accuracy and low-latency responses, with all speech recognition processed locally using engines trained for noisy factory environments and multiple accents.
- Meets strict privacy standards, as all voice data stays on-premises and never leaves the facility.
Requirements for Building a Manufacturing Voice Agent:
- Python 3.9+
- Microphone
- Speakers or headset for audio feedback
- Picovoice
AccessKeyfrom the Picovoice Console
Smart Factory Voice Agent Workflow
This Python-based factory voice agent uses an on-device architecture designed for reliability and low latency:
How it works:
Always-Listening Activation - The factory voice agent sits in a low-power, idle state using Porcupine Wake Word to monitor the audio stream for two distinct wake phrases. Detecting "Hey Factory" routes to instant equipment control, while "Hey Assistant" routes to the conversational AI for detailed queries. This dual-keyword approach lets workers choose the right path upfront.
Intent Understanding for Equipment Control - When "Hey Factory" is detected, the audio is analyzed by Rhino Speech-to-Intent. Instead of transcribing words one by one, it maps the speech directly to a pre-defined command (like "Emergency. Please stop"). The system executes the action immediately without further processing.
Speech-to-Text for Conversational Queries - When "Hey Assistant" is detected, the system routes directly to Cheetah Streaming Speech-to-Text. This engine converts natural, open-ended speech into a text string, capturing the full detail of complex questions or reports.
On-Device Language Model - The transcribed text is passed to picoLLM, which runs a specialized language model locally on the device. It interprets the user's question using the specific factory context such as shift data or machine specs to generate a relevant, intelligent text response.
Voice Response Generation - Finally, Orca Streaming Text-to-Speech converts the AI's text response into spoken audio. This provides the worker with immediate verbal confirmation or information, completing the hands-free loop.
The Voice Agent routes time-critical equipment commands for instant execution while handling complex production data queries through the LLM pipeline. All processing runs locally on industrial PCs or edge devices, eliminating network latency and ensuring reliable operation.
All Picovoice models such as Porcupine Wake Word, Rhino Speech-to-Intent, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech support multiple languages including English, Spanish, French, German and more. Build multilingual factory voice agents to serve international workforces by training models in the languages your teams speak.
Train Custom Wake Words for Factory Voice Agent
- Sign up for a Picovoice Console account and navigate to the Porcupine page.
- Enter your wake phrase such as "Hey Factory", and test it using the microphone button.
- Click "Train," select the target platform, and download the
.ppnmodel file. - Repeat steps 2 & 3 for to train an additional wake word for any production queries (e.g., "Hey Assistant")
Porcupine can detect multiple wake words simultaneously. For instance, support both "Hey Factory" and "Hey Assistant" for different tasks. For tips on designing an effective wake word, review the choosing a wake word guide.
Define Voice Commands for Equipment Control
- Create an empty Rhino Speech-to-Intent Context.
- Click the "Import YAML" button in the top-right corner of the console and paste the YAML provided below to define intents for factory equipment commands.
- Test the model with the microphone button and download the
.rhncontext file for your target platform.
You can refer to the Rhino Syntax Cheat Sheet for more details on building custom contexts.
YAML Context for Factory Equipment Commands:
This context handles critical equipment control commands.
Set Up Local Large Language Model
- Navigate to the picoLLM page in Picovoice Console.
- Select a model. This tutorial uses
llama-3.2-3b-instruct-505.pllm. - Download the
.pllmfile and place it in your project directory.
Install Required Python Libraries for Factory Voice Control
Install all required Python SDKs and dependencies using pip:
- Porcupine Wake Word Python SDK:
pvporcupine - Rhino Speech-to-Intent Python SDK:
pvrhino - Cheetah Streaming Speech-to-Text Python SDK:
pvcheetah - picoLLM Python SDK:
picollm - Orca Streaming Text-to-Speech Python SDK:
pvorca - Picovoice Python Recorder library:
pvrecorder - Picovoice Python Speaker library:
pvspeaker
Add Wake Word Detection for Hands-Free Activation
The following code captures audio from your microphone and detects the custom wake word locally:
Porcupine Wake Word processes each audio frame on-device with acoustic models trained to reject machinery noise and false positives. By listening for multiple wake words simultaneously, it routes workers to the right system path instantly - equipment control or production queries - without wasted processing.
Process Equipment Control Commands
Once the wake word is detected, Rhino Speech-to-Intent listens for structured equipment commands:
Rhino Speech-to-Intent directly infers intent from speech without requiring a separate transcription step, enabling instant equipment control.
Handle Production Data Queries with AI
When users say "Hey Assistant," the system routes directly to streaming speech-to-text and local LLM for natural language queries:
This approach uses Cheetah Streaming Speech-to-Text to transcribe natural speech with acoustic models optimized for industrial noise, then picoLLM to process the query and generate responses based on real-time production data.
Add AI Voice Response Generation for Smart Factories
Transform text responses into audible speech optimized for noisy environments:
Orca Streaming Text-to-Speech generates clear voice responses with first audio output in under 130ms, enabling seamless communication on the factory floor.
Execute Equipment Control Commands and Integrate with Manufacturing Systems
Route structured intents to manufacturing systems:
Complete Python Code for Smart Factory Voice Agent
This implementation combines all components for a production-ready factory voice agent:
Run the Smart Factory Voice Agent
To run the factory voice agent in Python, update the model paths to match your local files and have your Picovoice AccessKey ready:
Example interactions:
Equipment Control:
Production Query:
Looking to integrate voice with other manufacturing applications? Read about how Picovoice Enables Voice Picking to improve warehouse efficiency and accuracy.
You can start building your own commercial or non-commercial projects using Picovoice's self-service Console.
Start Building






