Build a Voice-Controlled Hotel Assistant in Python

🚀 On-device Voice AI & LLMs

Build commercial, non-commercial, research projects using the Forever-Free Plan.

TLDR: Build a hotel voice assistant that runs entirely on-device using Python. Learn how to create a privacy-first hotel room automation system that responds to guest commands locally. No cloud, no network latency and no data privacy concerns.

Voice AI is transforming hotel guest experiences and hospitality operations. Modern hotel rooms are increasingly equipped with voice-activated controls for lighting, temperature, entertainment, and concierge services. However, most implementations rely on cloud-based voice assistants (like Alexa or Google Home) that introduce latency and significant data privacy concerns. Guest conversations, room preferences, and behavioral data are transmitted to external servers—creating liability risks and potentially undermining guest trust.

On-device voice processing eliminates these issues, enabling smart hotel rooms with zero network latency and full data privacy. All processing happens locally, ensuring GDPR compliance without external data transmission. This tutorial demonstrates how to build a complete hotel room voice assistant using an on-device architecture: it handles structured commands (like lights, temperature) instantly, while seamlessly routing complex conversational queries to a local Large Language Model.

What You'll Build:

A privacy-first hotel assistant that:

Activates using two custom wake phrases - one for room controls (e.g., "Hey Smart Room") and one for guest queries (e.g., "Hey Concierge")
Processes room controls instantly without the cloud
Handles open-ended guest questions (e.g., "When is breakfast?") using a local LLM

What You'll Need:

Python 3.9+
Microphone and speakers for testing
Picovoice AccessKey from the Picovoice Console

On-Device Voice AI Architecture for Hospitality Applications

The hotel voice assistant system architecture uses deterministic control with generative AI to handle the full spectrum of hotel interactions:

Voice Activation: Porcupine Wake Word continuously monitors audio for two distinct wake phrases. Detecting "Hey Smart Room" routes to instant room control, while "Hey Concierge" routes to the conversational AI for guest queries. This dual-keyword approach lets guests choose the right path upfront.
Precise IoT Control: For smart hotel room automation, Rhino Speech-to-Intent maps spoken commands directly to structured JSON instructions. This engine ensures high accuracy for hardware interactions e.g., adjusting thermostats or lighting without the unpredictability of probabilistic models.
Local Large Language Model (LLM): For dynamic hotel guest services, the system utilizes Cheetah Streaming Speech-to-Text paired with picoLLM. This combination processes natural language inquiries locally, allowing the assistant to function as a knowledgeable concierge (e.g., answering questions about pool hours or checkout times).

This edge AI architecture addresses the unique challenges of hospitality voice assistants: guest privacy expectations, 24/7 reliability requirements, multilingual support, and seamless integration with existing property management systems.

All Picovoice models - Porcupine Wake Word, Rhino Speech-to-Intent, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech support multiple languages including English, Spanish, French, German and more.

Smart Hotel Voice Control Workflow:

Guest: "Hey Smart Room, turn on bedroom lights"
   ↓
[Porcupine] Detects "Hey Smart Room" → Routes to room control
   ↓
[Rhino] Recognizes intent → {"intent": "turnOnLight", "location": "bedroom"}
   ↓
[IoT Gateway] Executes command → Lights turn on
   ↓
[Orca] Speaks confirmation → "Turning on the bedroom lights"

Conversational Hospitality Workflow:

Guest: "Hey Concierge, when is breakfast?"
   ↓
[Porcupine] Detects "Hey Concierge" → Routes to conversational AI
   ↓
[Cheetah] Transcribes: "when is breakfast?"
   ↓
[picoLLM] Understands query → Generates response
   ↓
[Orca] Speaks answer → "Breakfast is served from 7 to 10:30 AM in the main dining room"

Train Custom Wake Words for Hotel Voice Assistant

Sign up for a Picovoice Console account and navigate to the Porcupine page.
Enter your first wake phrase for room controls (e.g., "Hey Smart Room") and test it using the microphone button.
Click "Train," select the target platform, and download the .ppn model file.
Repeat Steps 2 & 3 to train an additional wake word for detailed guest queries (e.g., "Hey Concierge").

Porcupine can detect multiple wake words simultaneously. For instance, it can support both "Hey Smart Room" and "Hey Concierge" for different tasks. For tips on designing an effective wake word, review the choosing a wake word guide.

Define Voice Commands for Smart Room Control

Create an empty Rhino Speech-to-Intent Context.
Click the "Import YAML" button in the top-right corner of the console and paste the YAML provided below to define intents for structured hotel room commands.
Test the model with the microphone button and download the .rhn context file for your target platform.

You can refer to the Rhino Syntax Cheat Sheet for more details on building custom contexts.

YAML Context for Hotel Room Commands:

context:
  expressions:
    turnOnLight:
      - "(@politeness) turn on (the) $location:location [light, lights]"
      - "(@politeness) lights on (in) (the) $location:location"
    
    turnOffLight:
      - "(@politeness) turn off (the) $location:location [light, lights]"
      - "(@politeness) lights off (in) (the) $location:location"
    
    setTemperature:
      - "(@politeness) set (the) temperature to $temperature:temperature [degree, degrees]"
      - "(@politeness) make it $temperature:temperature [degree, degrees]"
    
    controlBlinds:
      - "(@politeness) $blindAction:action (the) [blinds, curtains]"

  slots:
    location:
      - "bedroom"
      - "bathroom"
      - "all"
    
    temperature:
      - "sixty eight"
      - "seventy"
      - "seventy two"
      - "seventy five"
      - "eighty"
    
    blindAction:
      - "open"
      - "close"

  macros:
    politeness:
      - "please"
      - "can you"
      - "could you"

This context handles the most common structured room control commands. For conversational queries like "What time is breakfast?" or "I'm feeling cold, can you help?", the assistant will use the picoLLM path.

Set Up Conversational AI Model

Navigate to the picoLLM page in Picovoice Console.
Select a function-calling compatible model. This tutorial uses llama-3.2-1b-instruct-505.pllm.
Download the .pllm file and place it in your project directory.

Set Up the Python Environment

The following Python SDKs provide the complete smart hotel room AI stack for voice-enabled room automation. Install all required Python SDKs and dependencies using pip:

Porcupine Wake Word Python SDK: pvporcupine
Rhino Speech-to-Intent Python SDK: pvrhino
Cheetah Streaming Speech-to-Text Python SDK: pvcheetah
picoLLM Python SDK: picollm
Orca Streaming Text-to-Speech Python SDK: pvorca
Picovoice Python Recorder library: pvrecorder
Picovoice Python Speaker library: pvspeaker

pip install pvporcupine pvrhino pvcheetah picollm pvorca pvrecorder pvspeaker

Add Wake Word Detection for Hands-Free Activation

The following code captures audio from your default microphone and detects the custom wake word locally:

import pvporcupine
from pvrecorder import PvRecorder

ACCESS_KEY = "${ACCESS_KEY}"
ROOM_KEYWORD_PATH = "${ROOM_KEYWORD_PATH}"  # Path to "Hey Smart Room" .ppn file
CONCIERGE_KEYWORD_PATH = "${CONCIERGE_KEYWORD_PATH}"  # Path to "Hey Concierge" .ppn file

porcupine = pvporcupine.create(
    access_key=ACCESS_KEY,
    keyword_paths=[ROOM_KEYWORD_PATH, CONCIERGE_KEYWORD_PATH]
)

recorder = PvRecorder(frame_length=porcupine.frame_length)
recorder.start()

print("Listening for wake word...")

try:
    while True:
        pcm = recorder.read()
        keyword_index = porcupine.process(pcm)
        
        if keyword_index == 0:
            print("Room control wake word detected - routing to room control")
            # Route to Rhino for room commands
            break
        elif keyword_index == 1:
            print("Concierge wake word detected - routing to guest services")
            # Route to Cheetah + picoLLM for guest queries
            break
except KeyboardInterrupt:
    print("\nStopping...")
finally:
    recorder.stop()
    recorder.delete()
    porcupine.delete()

Porcupine Wake Word processes each audio frame on-device and triggers when either keyword is recognized. By listening for multiple wake words simultaneously, it routes guests to the right system path instantly - room control or concierge services - without continuous cloud streaming.

Understand User Voice Commands

Once "Hey Smart Room" is detected, Rhino Speech-to-Intent listens for structured room control commands:

import pvrhino
from pvrecorder import PvRecorder

ACCESS_KEY = "${ACCESS_KEY}"
CONTEXT_PATH = "${CONTEXT_PATH}"  # Path to .rhn file

rhino = pvrhino.create(
    access_key=ACCESS_KEY,
    context_path=CONTEXT_PATH
)

recorder = PvRecorder(frame_length=rhino.frame_length)
recorder.start()

print("Listening for command...")

try:
    while True:
        pcm = recorder.read()
        is_finalized = rhino.process(pcm)
        
        if is_finalized:
            inference = rhino.get_inference()
            
            if inference.is_understood:
                print('{')
                print("  intent : '%s'" % inference.intent)
                print('  slots : {')
                for slot, value in inference.slots.items():
                    print("    %s : '%s'" % (slot, value))
                print('  }')
                print('}\n')
                
                # Execute room control command
                execute_room_control(ACCESS_KEY, inference.intent, inference.slots)
            else:
                print("Didn't understand the command. Please try again.")
            
            break
except KeyboardInterrupt:
    print("\nStopping...")
finally:
    recorder.stop()
    recorder.delete()
    rhino.delete()

Rhino Speech-to-Intent directly infers intent from speech without requiring a separate transcription step.

Handle Open-Ended Guest Queries

When guests say "Hey Concierge," the system routes directly to streaming speech-to-text and local LLM for natural language queries:

import pvcheetah
import picollm
from pvrecorder import PvRecorder

ACCESS_KEY = "${ACCESS_KEY}"
PICOLLM_MODEL_PATH = "${PICOLLM_MODEL_PATH}"  # Path to .pllm file

def handle_conversational_query():
    """Process conversational queries using Cheetah + picoLLM"""
    
    # Initialize Cheetah for speech-to-text
    cheetah = pvcheetah.create(
        access_key=ACCESS_KEY,
        endpoint_duration_sec=1.0
    )
    
    recorder = PvRecorder(frame_length=cheetah.frame_length)
    recorder.start()
    
    print("Speak your question...")
    transcript = ""
    
    try:
        while True:
            pcm = recorder.read()
            partial_transcript, is_endpoint = cheetah.process(pcm)
            transcript += partial_transcript
            print(partial_transcript, end="", flush=True)
            
            if is_endpoint:
                final_transcript = cheetah.flush()
                transcript += final_transcript
                print(final_transcript)
                break
    except KeyboardInterrupt:
        print("\nStopping...")
    finally:
        recorder.stop()
        recorder.delete()
        cheetah.delete()
    
    # Process with picoLLM
    pllm = picollm.create(
        access_key=ACCESS_KEY,
        model_path=PICOLLM_MODEL_PATH
    )
    
    # Hotel-specific context for LLM
    hotel_info = """
    Hotel Information:
    - Breakfast: 7:00 AM - 10:30 AM, main dining room
    - Checkout: 11:00 AM
    - WiFi: Network "Hotel-Guest", Password "GuestWiFi2024"
    - Gym: 3rd floor, open 24/7
    - Pool: Rooftop, 6 AM - 10 PM
    """
    
    prompt = f"{hotel_info}\n\nGuest question: {transcript}\n\nProvide a helpful, concise response:"
    
    print("\nGenerating response...")
    
    response = pllm.generate(
        prompt=prompt,
        completion_token_limit=150,
        temperature=0.3
    )
    
    print(f"\nAssistant: {response.completion}")
    
    pllm.release()
    
    return response.completion

This approach uses Cheetah Streaming Speech-to-Text to transcribe the guest's natural speech, then picoLLM to understand the query and generate an appropriate response based on hotel information.

Add Voice Response Generation

Transform text responses into natural speech:

import pvorca
from pvspeaker import PvSpeaker
from collections import deque

def speak_response(access_key: str, text: str):
    """Convert text to speech and play"""
    orca = pvorca.create(access_key=access_key)
    speaker = PvSpeaker(
        sample_rate=orca.sample_rate,
        bits_per_sample=16
    )
    
    try:
        # Synthesize speech
        pcm_out, _ = orca.synthesize(text)
        
        # Play audio
        speaker.start()
        
        pcm_buffer = deque()
        pcm_buffer.append(pcm_out)
        
        while len(pcm_buffer) > 0:
            pcm = pcm_buffer.popleft()
            written = speaker.write(pcm)
            if written < len(pcm):
                pcm_buffer.appendleft(pcm[written:])
        
        speaker.flush()
        speaker.stop()
    except KeyboardInterrupt:
        print("\nStopping playback...")
        speaker.stop()
    finally:
        # Cleanup
        speaker.delete()
        orca.delete()

Orca Streaming Text-to-Speech generates natural voice responses with first audio output in under 130ms, creating a seamless conversational experience.

Execute Room Control Voice Commands on IoT Systems

Route user requests from structured JSON to IoT systems:

def execute_room_control(access_key: str, intent: str, slots: dict):
    """Execute room control commands and provide voice feedback"""
    
    if intent == "turnOnLight":
        location = slots.get('location', 'all')
        equipment_id = location.replace(' ', '_').lower()
        
        print(f"[IoT Command] Turning on {location} lights")
        # Integration point: send_to_iot_gateway(equipment_id, "lights", "on")
        
        speak_response(access_key, f"Turning on the {location} lights")
        
    elif intent == "turnOffLight":
        location = slots.get('location', 'all')
        equipment_id = location.replace(' ', '_').lower()
        
        print(f"[IoT Command] Turning off {location} lights")
        # Integration point: send_to_iot_gateway(equipment_id, "lights", "off")
        
        speak_response(access_key, f"Turning off the {location} lights")
    
    elif intent == "setTemperature":
        temperature = slots.get('temperature', '0')
        
        print(f"[IoT Command] Setting temperature to {temperature} degrees")
        # Integration point: send_to_iot_gateway("thermostat", "set", temperature)
        
        speak_response(access_key, f"Setting temperature to {temperature} degrees")
    
    elif intent == "controlBlinds":
        action = slots.get('blindAction', 'unknown')
        
        print(f"[IoT Command] {action.capitalize()}ing the blinds")
        # Integration point: send_to_iot_gateway("blinds", action)
        
        speak_response(access_key, f"{action.capitalize()}ing the blinds")

Complete Python Code for Hotel Room Voice Assistant

This implementation combines all components for a complete hotel room voice assistant:

# Hotel Room Voice Assistant

import argparse
import os
from collections import deque

import pvporcupine
import pvrhino
import pvcheetah
import picollm
import pvorca
from pvrecorder import PvRecorder
from pvspeaker import PvSpeaker


def speak_response(access_key: str, text: str):
    """Convert text to speech and play"""
    orca = pvorca.create(access_key=access_key)
    speaker = PvSpeaker(
        sample_rate=orca.sample_rate,
        bits_per_sample=16
    )
    
    try:
        # Synthesize speech
        pcm_out, _ = orca.synthesize(text)
        
        # Play audio
        speaker.start()
        
        pcm_buffer = deque()
        pcm_buffer.append(pcm_out)
        
        while len(pcm_buffer) > 0:
            pcm = pcm_buffer.popleft()
            written = speaker.write(pcm)
            if written < len(pcm):
                pcm_buffer.appendleft(pcm[written:])
        
        speaker.flush()
        speaker.stop()
    except KeyboardInterrupt:
        print("\nStopping playback...")
        speaker.stop()
    finally:
        # Cleanup
        speaker.delete()
        orca.delete()


def execute_room_control(access_key: str, intent: str, slots: dict):
    """Execute room control commands"""
    
    if intent == "turnOnLight":
        location = slots.get('location', 'all')
        print(f"[IoT Command] Turning on {location} lights")
        speak_response(access_key, f"Turning on the {location} lights")
        
    elif intent == "turnOffLight":
        location = slots.get('location', 'all')
        print(f"[IoT Command] Turning off {location} lights")
        speak_response(access_key, f"Turning off the {location} lights")
        
    elif intent == "setTemperature":
        temperature = slots.get('temperature', '0')
        print(f"[IoT Command] Setting temperature to {temperature} degrees")
        speak_response(access_key, f"Setting temperature to {temperature} degrees")
        
    elif intent == "controlBlinds":
        action = slots.get('blindAction', 'unknown')
        print(f"[IoT Command] {action.capitalize()}ing the blinds")
        speak_response(access_key, f"{action.capitalize()}ing the blinds")


def handle_room_command(access_key: str, context_path: str):
    """Process room commands using Rhino Speech-to-Intent"""
    
    try:
        rhino = pvrhino.create(
            access_key=access_key,
            context_path=context_path)
    except pvrhino.RhinoError as e:
        print("Failed to initialize Rhino")
        raise e

    print(f'Rhino version: {rhino.version}')

    recorder = PvRecorder(frame_length=rhino.frame_length)
    recorder.start()

    print('Listening for command...')

    try:
        while True:
            pcm = recorder.read()
            is_finalized = rhino.process(pcm)

            if is_finalized:
                inference = rhino.get_inference()
                if inference.is_understood:
                    print('{')
                    print("  intent : '%s'" % inference.intent)
                    print('  slots : {')
                    for slot, value in inference.slots.items():
                        print("    '%s' : '%s'" % (slot, value))
                    print('  }')
                    print('}\n')
                    
                    # Execute room control command
                    execute_room_control(access_key, inference.intent, inference.slots)
                else:
                    print("Didn't understand the command. Please try again.")
                
                break

    except KeyboardInterrupt:
        print('\nStopping...')

    finally:
        recorder.stop()
        recorder.delete()
        rhino.delete()


def handle_conversational_query(access_key: str, pllm_model_path: str):
    """Process conversational queries using Cheetah + picoLLM"""
    
    # Initialize Cheetah for speech-to-text
    cheetah = pvcheetah.create(
        access_key=access_key,
        endpoint_duration_sec=1.0
    )
    
    recorder = PvRecorder(frame_length=cheetah.frame_length)
    recorder.start()
    
    print("Speak your question...")
    transcript = ""
    
    try:
        while True:
            pcm = recorder.read()
            partial_transcript, is_endpoint = cheetah.process(pcm)
            transcript += partial_transcript
            print(partial_transcript, end="", flush=True)
            
            if is_endpoint:
                final_transcript = cheetah.flush()
                transcript += final_transcript
                print(final_transcript)
                break
    except KeyboardInterrupt:
        print("\nStopping...")
    finally:
        recorder.stop()
        recorder.delete()
        cheetah.delete()
    
    # Process with picoLLM
    pllm = picollm.create(
        access_key=access_key,
        model_path=pllm_model_path
    )
    
    hotel_info = """
    Hotel Information:
    - Breakfast: 7:00 AM - 10:30 AM, main dining room
    - Checkout: 11:00 AM
    - WiFi: Network "Hotel-Guest", Password "GuestWiFi2024"
    - Gym: 3rd floor, open 24/7
    - Pool: Rooftop, 6 AM - 10 PM
    """
    
    prompt = f"{hotel_info}\n\nGuest question: {transcript}\n\nProvide a helpful, concise response:"
    
    print("\nGenerating response...")
    
    response = pllm.generate(
        prompt=prompt,
        completion_token_limit=150,
        temperature=0.3
    )
    
    print(f"\nAssistant: {response.completion}")
    
    # Speak the response
    speak_response(access_key, response.completion)
    
    pllm.release()


def main():
    parser = argparse.ArgumentParser()

    parser.add_argument(
        '--access_key',
        help='AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)',
        required=True)

    parser.add_argument(
        '--room_keyword_path',
        help='Absolute path to room control wake word model file (.ppn)',
        required=True)

    parser.add_argument(
        '--concierge_keyword_path',
        help='Absolute path to concierge wake word model file (.ppn)',
        required=True)

    parser.add_argument(
        '--context_path',
        help='Absolute path to Rhino context file (.rhn)',
        required=True)

    parser.add_argument(
        '--pllm_model_path',
        help='Absolute path to picoLLM model file (.pllm)',
        required=True)

    args = parser.parse_args()

    print("Hotel Room Voice Assistant")
    print("=" * 50)

    # Main loop for continuous operation
    while True:
        # Stage 1: Wake Word Detection with dual keywords
        try:
            porcupine = pvporcupine.create(
                access_key=args.access_key,
                keyword_paths=[args.room_keyword_path, args.concierge_keyword_path])
        except pvporcupine.PorcupineError as e:
            print("Failed to initialize Porcupine")
            raise e

        # Extract keyword names from filenames
        keywords = []
        for keyword_path in [args.room_keyword_path, args.concierge_keyword_path]:
            keyword_phrase_part = os.path.basename(keyword_path).replace('.ppn', '').split('_')
            if len(keyword_phrase_part) > 6:
                keywords.append(' '.join(keyword_phrase_part[0:-6]))
            else:
                keywords.append(keyword_phrase_part[0])

        print(f'Porcupine version: {porcupine.version}')

        recorder = PvRecorder(frame_length=porcupine.frame_length)
        recorder.start()

        print('Listening for wake word... (press Ctrl+C to exit)')
        print(f'  Say "{keywords[0]}" for room controls')
        print(f'  Say "{keywords[1]}" for guest services')

        detected_keyword_index = -1

        try:
            while True:
                pcm = recorder.read()
                result = porcupine.process(pcm)

                if result >= 0:
                    print(f'Detected "{keywords[result]}"')
                    detected_keyword_index = result
                    break

        except KeyboardInterrupt:
            print('\nStopping...')
            recorder.stop()
            recorder.delete()
            porcupine.delete()
            break

        finally:
            recorder.stop()
            recorder.delete()
            porcupine.delete()

        # Stage 2: Route based on detected wake word
        if detected_keyword_index == 0:
            # Room control wake word - route to Rhino
            handle_room_command(args.access_key, args.context_path)
        elif detected_keyword_index == 1:
            # Concierge wake word - route to Cheetah + picoLLM
            handle_conversational_query(args.access_key, args.pllm_model_path)


if __name__ == '__main__':
    main()

Run the Hotel Room Voice Assistant

To run the voice assistant, update the model paths to match your local files and have your Picovoice AccessKey ready:

python3 hotel_assistant.py \
  --access_key "$ACCESS_KEY" \
  --room_keyword_path ./models/hey-smart-room.ppn \
  --concierge_keyword_path ./models/hey-concierge.ppn \
  --context_path ./models/hotel-room-commands.rhn \
  --pllm_model_path ./models/llama-3.2-1b-instruct-505.pllm

Example interactions:

Structured command: "Hey Smart Room, turn off the bedroom lights"
→ Rhino processes instantly → IoT command executed → Voice confirmation

Conversational query: "Hey Concierge, what time is breakfast?"
→ Cheetah transcribes → picoLLM responds → Orca speaks answer

You can start building your own commercial or non-commercial projects using Picovoice's self-service Console.

Start Building

Frequently Asked Questions

Will the voice assistant work accurately with different accents or background noise?

Yes. Porcupine Wake Word, Rhino Speech-to-Intent, and Cheetah Streaming Speech-to-Text are designed to work reliably in real-world hotel environments with ambient noise, air conditioning sounds, and various accents across supported languages. The models are trained on diverse acoustic conditions to ensure consistent performance.

When should I use Rhino Speech-to-Intent versus picoLLM for guest queries?

Use Rhino Speech-to-Intent for structured, predictable commands like room controls. Use picoLLM for open-ended conversational queries where guests might phrase requests in unpredictable ways. The dual wake word architecture lets guests choose the appropriate path upfront - "Hey Smart Room" for room controls and "Hey Concierge" for conversational queries.