🚀 On-device Voice AI & LLMs
Build commercial, non-commercial, research projects using the Forever-Free Plan.
Start Free

TLDR: Build a hotel voice assistant that runs entirely on-device using Python. Learn how to create a privacy-first hotel room automation system that responds to guest commands locally. No cloud, no network latency and no data privacy concerns.

Voice AI is transforming hotel guest experiences and hospitality operations. Modern hotel rooms are increasingly equipped with voice-activated controls for lighting, temperature, entertainment, and concierge services. However, most implementations rely on cloud-based voice assistants (like Alexa or Google Home) that introduce latency and significant data privacy concerns. Guest conversations, room preferences, and behavioral data are transmitted to external servers—creating liability risks and potentially undermining guest trust.

On-device voice processing eliminates these issues, enabling smart hotel rooms with zero network latency and full data privacy. All processing happens locally, ensuring GDPR compliance without external data transmission. This tutorial demonstrates how to build a complete hotel room voice assistant using an on-device architecture: it handles structured commands (like lights, temperature) instantly, while seamlessly routing complex conversational queries to a local Large Language Model.

What You'll Build:

A privacy-first hotel assistant that:

  • Activates using two custom wake phrases - one for room controls (e.g., "Hey Smart Room") and one for guest queries (e.g., "Hey Concierge")
  • Processes room controls instantly without the cloud
  • Handles open-ended guest questions (e.g., "When is breakfast?") using a local LLM

What You'll Need:

  • Python 3.9+
  • Microphone and speakers for testing
  • Picovoice AccessKey from the Picovoice Console

On-Device Voice AI Architecture for Hospitality Applications

The hotel voice assistant system architecture uses deterministic control with generative AI to handle the full spectrum of hotel interactions:

  1. Voice Activation: Porcupine Wake Word continuously monitors audio for two distinct wake phrases. Detecting "Hey Smart Room" routes to instant room control, while "Hey Concierge" routes to the conversational AI for guest queries. This dual-keyword approach lets guests choose the right path upfront.
  2. Precise IoT Control: For smart hotel room automation, Rhino Speech-to-Intent maps spoken commands directly to structured JSON instructions. This engine ensures high accuracy for hardware interactions e.g., adjusting thermostats or lighting without the unpredictability of probabilistic models.
  3. Local Large Language Model (LLM): For dynamic hotel guest services, the system utilizes Cheetah Streaming Speech-to-Text paired with picoLLM. This combination processes natural language inquiries locally, allowing the assistant to function as a knowledgeable concierge (e.g., answering questions about pool hours or checkout times).

This edge AI architecture addresses the unique challenges of hospitality voice assistants: guest privacy expectations, 24/7 reliability requirements, multilingual support, and seamless integration with existing property management systems.

All Picovoice models - Porcupine Wake Word, Rhino Speech-to-Intent, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech support multiple languages including English, Spanish, French, German and more.

Smart Hotel Voice Control Workflow:

Conversational Hospitality Workflow:

Train Custom Wake Words for Hotel Voice Assistant

  1. Sign up for a Picovoice Console account and navigate to the Porcupine page.
  2. Enter your first wake phrase for room controls (e.g., "Hey Smart Room") and test it using the microphone button.
  3. Click "Train," select the target platform, and download the .ppn model file.
  4. Repeat Steps 2 & 3 to train an additional wake word for detailed guest queries (e.g., "Hey Concierge").

Porcupine can detect multiple wake words simultaneously. For instance, it can support both "Hey Smart Room" and "Hey Concierge" for different tasks. For tips on designing an effective wake word, review the choosing a wake word guide.

Define Voice Commands for Smart Room Control

  1. Create an empty Rhino Speech-to-Intent Context.
  2. Click the "Import YAML" button in the top-right corner of the console and paste the YAML provided below to define intents for structured hotel room commands.
  3. Test the model with the microphone button and download the .rhn context file for your target platform.

You can refer to the Rhino Syntax Cheat Sheet for more details on building custom contexts.

YAML Context for Hotel Room Commands:

This context handles the most common structured room control commands. For conversational queries like "What time is breakfast?" or "I'm feeling cold, can you help?", the assistant will use the picoLLM path.

Set Up Conversational AI Model

  1. Navigate to the picoLLM page in Picovoice Console.
  2. Select a function-calling compatible model. This tutorial uses llama-3.2-1b-instruct-505.pllm.
  3. Download the .pllm file and place it in your project directory.

Set Up the Python Environment

The following Python SDKs provide the complete smart hotel room AI stack for voice-enabled room automation. Install all required Python SDKs and dependencies using pip:

Add Wake Word Detection for Hands-Free Activation

The following code captures audio from your default microphone and detects the custom wake word locally:

Porcupine Wake Word processes each audio frame on-device and triggers when either keyword is recognized. By listening for multiple wake words simultaneously, it routes guests to the right system path instantly - room control or concierge services - without continuous cloud streaming.

Understand User Voice Commands

Once "Hey Smart Room" is detected, Rhino Speech-to-Intent listens for structured room control commands:

Rhino Speech-to-Intent directly infers intent from speech without requiring a separate transcription step.

Handle Open-Ended Guest Queries

When guests say "Hey Concierge," the system routes directly to streaming speech-to-text and local LLM for natural language queries:

This approach uses Cheetah Streaming Speech-to-Text to transcribe the guest's natural speech, then picoLLM to understand the query and generate an appropriate response based on hotel information.

Add Voice Response Generation

Transform text responses into natural speech:

Orca Streaming Text-to-Speech generates natural voice responses with first audio output in under 130ms, creating a seamless conversational experience.

Execute Room Control Voice Commands on IoT Systems

Route user requests from structured JSON to IoT systems:

Complete Python Code for Hotel Room Voice Assistant

This implementation combines all components for a complete hotel room voice assistant:

Run the Hotel Room Voice Assistant

To run the voice assistant, update the model paths to match your local files and have your Picovoice AccessKey ready:

Example interactions:

You can start building your own commercial or non-commercial projects using Picovoice's self-service Console.

Start Building

Frequently Asked Questions

Will the voice assistant work accurately with different accents or background noise?
Yes. Porcupine Wake Word, Rhino Speech-to-Intent, and Cheetah Streaming Speech-to-Text are designed to work reliably in real-world hotel environments with ambient noise, air conditioning sounds, and various accents across supported languages. The models are trained on diverse acoustic conditions to ensure consistent performance.
When should I use Rhino Speech-to-Intent versus picoLLM for guest queries?
Use Rhino Speech-to-Intent for structured, predictable commands like room controls. Use picoLLM for open-ended conversational queries where guests might phrase requests in unpredictable ways. The dual wake word architecture lets guests choose the appropriate path upfront - "Hey Smart Room" for room controls and "Hey Concierge" for conversational queries.