Smart TV Voice Assistant Tutorial in Python

🚀 On-device Voice AI & LLMs

Build commercial, non-commercial, research projects using the Forever-Free Plan.

TLDR: Build on-device voice control for smart TVs in Python. In this tutorial, we show how to add voice AI-powered search to Smart TV using Speech-to-Intent for instant results and on-device voice recognition. Structured voice commands enable fast smart TV content discovery and voice-controlled TV navigation, while open-ended queries route to a local LLM for AI recommendations. All voice processing runs on-device, reducing latency and protecting user privacy.

On-Device Voice Search for Smart TV

Smart TV voice search works best when responses feel instant. Cloud-based pipelines add network latency at each processing stage, which can make the experience feel slower than expected. On-device voice AI processes speech locally, cutting out network round-trips and keeping user data private.

This tutorial shows how to build a smart TV voice assistant in Python. Custom wake words in English and Spanish activate the voice search hands-free, structured content commands route through Speech-to-Intent for instant catalog lookups, and open-ended requests go to a local LLM for AI recommendations, all on-device.

What You'll Build:

A smart TV voice assistant that:

Activates using "Hey TV" or "Oye TV" for content search, and "Hey Assistant" or "Oye Asistente" for AI recommendations
Searches the local content catalog instantly for structured queries
Routes open-ended requests to a local LLM for intelligent content matching
Responds with natural speech synthesis

With an on-device architecture, the smart TV voice assistant:

Delivers low-latency responses with all speech processing running locally on the device's hardware.
Keeps all user audio and viewing preferences on-device, meeting GDPR and CCPA privacy compliance expectations for in-home devices.

What You'll Need:

Python 3.9+
Laptop or desktop with microphone and speakers for testing
Picovoice AccessKey from the Picovoice Console

Smart TV Voice Search Architecture

This Python-based voice search system uses an on-device voice pipeline designed for instant content discovery and AI recommendations. This pattern is useful for voice-controlled TVs and hands-free content discovery experiences:

Always-Listening Activation — The voice search system sits in a low-power, idle state using Porcupine Wake Word to monitor the audio stream for four wake phrases across two languages. Detecting "Hey TV" or "Oye TV" routes to instant content search, while "Hey Assistant" or "Oye Asistente" routes to the AI recommendation assistant.
Intent Recognition for Content Search — When "Hey TV" or "Oye TV" is detected, the audio is analyzed by Rhino Speech-to-Intent. Instead of transcribing words one by one, it maps the speech directly to a structured content query — like "search action movies" or "resume watching." The system queries the local content catalog and returns results immediately without further processing.
Speech-to-Text for Open-Ended Requests — When "Hey Assistant" or "Oye Asistente" is detected, the system routes directly to Cheetah Streaming Speech-to-Text. This captures free-form requests like "something funny for the whole family" that do not map cleanly to a fixed intent.
On-Device Language Model — The transcribed request is passed to picoLLM along with the device's content catalog. The local language model interprets what the viewer is looking for and matches it against available titles, returning structured recommendations without any cloud processing.
Voice Response Generation — Orca Streaming Text-to-Speech converts the response into natural speech, completing the hands-free loop from query to recommendation.

Content Search Workflow:

Viewer: "Oye TV, find action movies"
   ↓
[Porcupine] Detects "Oye TV" → Routes to content search
   ↓
[Rhino] Recognizes intent → {"intent": "searchByGenre", "genre": "action"}
   ↓
[Content Catalog] Queries local library → Returns top-rated matches
   ↓
[Orca] Speaks results → "Here are some action movies for you..."

AI Recommendation Workflow:

Viewer: "Hey Assistant, something fun for the whole family"
   ↓
[Porcupine] Detects "Hey Assistant" → Routes to recommendation assistant
   ↓
[Cheetah] Transcribes: "something fun for the whole family"
   ↓
[picoLLM] Matches against catalog → Returns AI recommendations
   ↓
[Orca] Speaks picks → "Here are some family-friendly options..."

Porcupine Wake Word, Rhino Speech-to-Intent, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech support multiple languages including English, Spanish, German and more. Build multilingual voice search to serve international markets by training models in the languages your target regions speak.

Train Custom Wake Words for Smart TV Voice Search

Sign up for a Picovoice Console account and navigate to the Porcupine page.
Enter your first wake phrase for content commands (e.g., "Hey TV") and test it using the microphone button.
Click "Train," select the target platform, and download the .ppn model file.
Repeat steps 2 & 3 to train an additional wake word for AI recommendations (e.g., "Hey Assistant").
Train Spanish wake words: select "Spanish" as the target language in the console, train "Oye TV" and "Oye Asistente," test them, and download both .ppn model files.

Porcupine can detect multiple wake words simultaneously. For instance, it can support both "Oye Asistente" (Spanish) and "Hey Assistant" (English) at the same time, both routing to the same recommendation voice assistant. For tips on designing an effective wake word, review the choosing a wake word guide.

Define Voice Commands for Content Discovery

Create an empty Rhino Speech-to-Intent Context.
Click the "Import YAML" button in the top-right corner of the console and paste the YAML provided below to define intents for structured content search commands.
Test the model with the microphone button and download the .rhn context file for your target platform.

You can refer to the Rhino Syntax Cheat Sheet for more details on custom voice commands by training domain-specific AI models.

Train Custom Voice Commands to Discover TV Content using YAML Context:

context:
  expressions:
    searchByGenre:
      - "@query (a) $genre:genre @contentType"
    resumeContent:
      - "[resume, continue] [watching, @contentType]"
    topRated:
      - "@query (the) [top rated, best, popular] @contentType"
    nowPlaying:
      - "what's [on, playing] (right now)"
    addToWatchlist:
      - "[add, save, bookmark] (that, this) to (my) watchlist"
  slots:
    genre:
      - action
      - comedy
      - drama
      - thriller
      - horror
      - documentary
      - animation
      - family
  macros:
    query:
      - search for
      - find
      - show me
      - I want to watch
      - what are
    contentType:
      - movie
      - movies
      - show
      - shows
      - series

This context handles the most common structured content search commands. For open-ended requests like "something relaxing to watch tonight" or "a movie similar to what I watched last night," the assistant will use the picoLLM recommendation path.

Set Up a Local Large Language Model

Navigate to the picoLLM page in Picovoice Console.
Select a model. This tutorial uses llama-3.2-3b-instruct-505.pllm.
Download the .pllm file and place it in your project directory.

Install Required Python Libraries for Smart TV Voice Search

Install all required Python SDKs and dependencies using pip:

Porcupine Wake Word Python SDK: pvporcupine
Rhino Speech-to-Intent Python SDK: pvrhino
Cheetah Streaming Speech-to-Text Python SDK: pvcheetah
picoLLM Python SDK: picollm
Orca Streaming Text-to-Speech Python SDK: pvorca
Picovoice Python Recorder library: pvrecorder
Picovoice Python Speaker library: pvspeaker

pip install pvporcupine pvrhino pvcheetah picollm pvorca pvrecorder pvspeaker

Add Wake Word Detection for Hands-Free Activation

The following code captures audio from your microphone and detects the custom wake word locally:

import pvporcupine
from pvrecorder import PvRecorder

ACCESS_KEY = "${ACCESS_KEY}"
COMMAND_KEYWORD_PATH = "${COMMAND_KEYWORD_PATH}"        # "Hey TV" .ppn
QUERY_KEYWORD_PATH = "${QUERY_KEYWORD_PATH}"            # "Hey Assistant" .ppn
COMMAND_ES_KEYWORD_PATH = "${COMMAND_ES_KEYWORD_PATH}"  # "Oye TV" .ppn
QUERY_ES_KEYWORD_PATH = "${QUERY_ES_KEYWORD_PATH}"      # "Oye Asistente" .ppn

porcupine = pvporcupine.create(
    access_key=ACCESS_KEY,
    keyword_paths=[
        COMMAND_KEYWORD_PATH,
        QUERY_KEYWORD_PATH,
        COMMAND_ES_KEYWORD_PATH,
        QUERY_ES_KEYWORD_PATH
    ]
)

recorder = PvRecorder(frame_length=porcupine.frame_length)
recorder.start()

print("Listening for wake word...")

try:
    keyword_index = -1
    while keyword_index < 0:
        pcm = recorder.read()
        keyword_index = porcupine.process(pcm)

    if keyword_index in [0, 2]:
        print("Content search wake word detected - routing to content commands")
    elif keyword_index in [1, 3]:
        print("Recommendation wake word detected - routing to AI assistant")
except KeyboardInterrupt:
    print("\nStopping...")
finally:
    recorder.stop()
    recorder.delete()
    porcupine.delete()

Porcupine Wake Word processes each audio frame on-device with acoustic models optimized for living room environments. By listening for multiple wake words simultaneously, it routes viewers to the right system path instantly, such as content search or AI recommendations.

Process Content Search Commands

Once the wake word is detected, Rhino Speech-to-Intent listens for structured content queries:

import pvrhino
from pvrecorder import PvRecorder

ACCESS_KEY = "${ACCESS_KEY}"
CONTEXT_PATH = "${CONTEXT_PATH}"  # Path to .rhn file

rhino = pvrhino.create(
    access_key=ACCESS_KEY,
    context_path=CONTEXT_PATH
)

recorder = PvRecorder(frame_length=rhino.frame_length)
recorder.start()

print("Listening for content command...")

try:
    is_finalized = False
    while not is_finalized:
        pcm = recorder.read()
        is_finalized = rhino.process(pcm)

    inference = rhino.get_inference()

    if inference.is_understood:
        print('{')
        print("  intent : '%s'" % inference.intent)
        print('  slots : {')
        for slot, value in inference.slots.items():
            print("    %s : '%s'" % (slot, value))
        print('  }')
        print('}\n')

        # Route to content catalog search
        handle_content_command(ACCESS_KEY, inference.intent, inference.slots)
    else:
        print("Didn't understand the command. Please try again.")
except KeyboardInterrupt:
    print("\nStopping...")
finally:
    recorder.stop()
    recorder.delete()
    rhino.delete()

Rhino Speech-to-Intent directly infers intent from speech without requiring a separate transcription step, enabling instant content catalog lookups for structured queries.

Handle User Recommendations with AI

When viewers say "Hey Assistant" or "Oye Asistente," the system routes directly to streaming speech-to-text and local LLM for open-ended content discovery:

import json
import pvcheetah
import picollm
from pvrecorder import PvRecorder

ACCESS_KEY = "${ACCESS_KEY}"
PICOLLM_MODEL_PATH = "${PICOLLM_MODEL_PATH}"  # Path to .pllm file

CONTENT_CATALOG = [
    {"id": "1001", "title": "The Last Frontier", "type": "movie", "genre": ["action", "adventure"],
     "year": 2023, "rating": 8.4, "description": "A retired soldier uncovers a conspiracy that reaches the highest levels of government."},
    {"id": "1002", "title": "Laugh Track", "type": "movie", "genre": ["comedy"],
     "year": 2024, "rating": 7.2, "description": "A stand-up comedian's life is turned upside down when a celebrity endorses their open mic set."},
    {"id": "1003", "title": "Midnight Echoes", "type": "movie", "genre": ["thriller", "mystery"],
     "year": 2022, "rating": 8.1, "description": "A detective traces a series of cryptic messages leading to a decades-old unsolved case."},
    {"id": "1004", "title": "Wild Kingdom", "type": "show", "genre": ["documentary", "family"],
     "year": 2023, "rating": 8.7, "description": "An immersive documentary series exploring ecosystems across six continents."},
    {"id": "1005", "title": "Star Odyssey", "type": "movie", "genre": ["action", "adventure"],
     "year": 2024, "rating": 8.9, "description": "Humanity's first interstellar crew faces impossible odds on a mission to find a new home."},
    {"id": "1006", "title": "Family Circus", "type": "movie", "genre": ["animation", "family", "comedy"],
     "year": 2023, "rating": 7.8, "description": "A chaotic circus family embarks on a road trip that brings them closer together."},
    {"id": "1007", "title": "Dark Horizon", "type": "movie", "genre": ["thriller", "action"],
     "year": 2024, "rating": 7.9, "description": "An international fugitive races against time to clear their name before a global summit."},
    {"id": "1008", "title": "Crimson City", "type": "show", "genre": ["crime", "drama"],
     "year": 2022, "rating": 8.5, "description": "A detective navigates corruption and rivalry in a city where everyone has secrets."},
]

RECOMMENDATION_PROMPT = """You are a smart TV content recommendation assistant.
Given a user's request and a catalog of available content, recommend the most relevant titles.
Respond only in JSON format: {"recommendations": [{"id": "content_id", "reason": "brief reason"}]}
Only recommend content that exists in the provided catalog. Limit to 3 recommendations."""


def handle_llm_query():
    """Process open-ended content requests using Cheetah + picoLLM"""
    
    # Initialize Cheetah for speech-to-text
    cheetah = pvcheetah.create(
        access_key=ACCESS_KEY,
        endpoint_duration_sec=1.0
    )
    
    recorder = PvRecorder(frame_length=cheetah.frame_length)
    recorder.start()
    
    print("Speak your request...")
    transcript = ""
    
    try:
        is_endpoint = False
        while not is_endpoint:
            pcm = recorder.read()
            partial_transcript, is_endpoint = cheetah.process(pcm)
            transcript += partial_transcript
            print(partial_transcript, end="", flush=True)

        final_transcript = cheetah.flush()
        transcript += final_transcript
        print(final_transcript)

    except KeyboardInterrupt:
        print("\nStopping...")

    finally:
        recorder.stop()
        recorder.delete()
        cheetah.delete()
    
    if not transcript.strip():
        print("No speech detected.")
        return
    
    # Process with picoLLM
    pllm = picollm.create(
        access_key=ACCESS_KEY,
        model_path=PICOLLM_MODEL_PATH
    )
    
    catalog_summary = json.dumps([
        {"id": c["id"], "title": c["title"], "type": c["type"],
         "genre": c["genre"], "year": c["year"], "rating": c["rating"],
         "description": c["description"]}
        for c in CONTENT_CATALOG
    ])
    
    prompt = (f"{RECOMMENDATION_PROMPT}\n\n"
              f"Content catalog:\n{catalog_summary}\n\n"
              f'Viewer request: "{transcript}"\n\n'
              "Respond with valid JSON only:")
    
    print("\nGenerating recommendations...")
    
    response = pllm.generate(
        prompt=prompt,
        completion_token_limit=200
    )
    
    # Parse JSON recommendations from LLM output
    try:
        clean = response.completion.replace("```json", "").replace("```", "").strip()
        result = json.loads(clean)
        recommendations = result.get("recommendations", [])
    except json.JSONDecodeError:
        recommendations = []
    
    spoken = format_recommendations(recommendations)
    print(f"\nAssistant: {spoken}")
    
    speak_response(ACCESS_KEY, spoken)
    
    pllm.release()

This approach uses Cheetah Streaming Speech-to-Text to capture the viewer's open-ended request, then picoLLM to match it against the local content catalog and generate structured recommendations — all without leaving the device.

Add Voice Response Generation for Smart TV

Transform text responses into natural speech for TV playback:

import pvorca
from pvspeaker import PvSpeaker
from collections import deque

def speak_response(access_key: str, text: str) -> None:
    """Convert text to speech and play"""
    orca = pvorca.create(access_key=access_key)
    speaker = PvSpeaker(
        sample_rate=orca.sample_rate,
        bits_per_sample=16
    )
    
    try:
        # Synthesize speech
        pcm_out, _ = orca.synthesize(text)
        
        # Play audio
        speaker.start()
        
        pcm_buffer = deque()
        pcm_buffer.append(pcm_out)
        
        while len(pcm_buffer) > 0:
            pcm = pcm_buffer.popleft()
            written = speaker.write(pcm)
            if written < len(pcm):
                pcm_buffer.appendleft(pcm[written:])
        
        speaker.flush()
        speaker.stop()
    except KeyboardInterrupt:
        print("\nStopping playback...")
        speaker.stop()
    finally:
        # Cleanup
        speaker.delete()
        orca.delete()

Orca Streaming Text-to-Speech generates natural voice responses with first audio output in under 130ms, providing immediate verbal feedback when a viewer speaks a command.

Route Content Search Commands to Local Catalog

Map structured intents to content catalog queries and format results for voice delivery:

RESUME_STATE = {"id": "1003", "title": "Midnight Echoes", "position_sec": 4230}
WATCHLIST = []


def search_by_genre(genre: str) -> list:
    """Return content matching the requested genre, sorted by rating"""
    results = [c for c in CONTENT_CATALOG if genre.lower() in [g.lower() for g in c["genre"]]]
    results.sort(key=lambda x: x["rating"], reverse=True)
    return results[:3]


def get_top_rated(content_type=None) -> list:
    """Return top-rated content, optionally filtered by type"""
    results = list(CONTENT_CATALOG)
    if content_type:
        results = [c for c in results if c["type"] == content_type]
    results.sort(key=lambda x: x["rating"], reverse=True)
    return results[:3]


def format_content_results(results: list) -> str:
    """Format catalog results into a speakable response"""
    if not results:
        return "I couldn't find anything matching that. Try a different genre or ask me for recommendations."

    if len(results) == 1:
        c = results[0]
        return (f"I found {c['title']}, a {c['genre'][0]} {c['type']} from {c['year']} "
                f"rated {c['rating']} out of 10. {c['description']}")

    titles = [c["title"] for c in results]
    response = f"Here are some options: {', '.join(titles[:-1])}, and {titles[-1]}."
    response += f" {results[0]['title']} is the highest rated at {results[0]['rating']} out of 10."
    return response


def format_recommendations(recommendations: list) -> str:
    """Format picoLLM recommendations into a speakable response"""
    catalog_map = {c["id"]: c for c in CONTENT_CATALOG}
    matched = []
    for rec in recommendations:
        if rec["id"] in catalog_map:
            matched.append({**catalog_map[rec["id"]], "reason": rec.get("reason", "")})

    if not matched:
        return "I couldn't find a good match for that. Try searching by genre or ask for top-rated content."

    if len(matched) == 1:
        c = matched[0]
        return f"Based on what you're looking for, I'd suggest {c['title']}. {c['reason']}"

    titles = [c["title"] for c in matched]
    return f"Here are some picks for you: {', '.join(titles[:-1])}, and {titles[-1]}. {matched[0]['reason']}"


def handle_content_command(access_key: str, intent: str, slots: dict[str, str]) -> None:
    """Execute content search commands and provide voice feedback"""
    
    if intent == "searchByGenre":
        genre = slots.get('genre', '')
        print(f"[Catalog] Searching for {genre} content")
        results = search_by_genre(genre)
        response = format_content_results(results)
        speak_response(access_key, response)
        
    elif intent == "resumeContent":
        if RESUME_STATE:
            print(f"[Playback] Resuming {RESUME_STATE['title']}")
            speak_response(access_key, f"Resuming {RESUME_STATE['title']} from where you left off.")
        else:
            speak_response(access_key, "You don't have anything in progress. Would you like me to suggest something?")
    
    elif intent == "topRated":
        print("[Catalog] Fetching top-rated content")
        results = get_top_rated()
        response = format_content_results(results)
        speak_response(access_key, response)
        
    elif intent == "nowPlaying":
        # Integration point: query TV tuner or live TV guide
        speak_response(access_key, "Live TV is available on input two. Switch to that input to browse live channels.")
        
    elif intent == "addToWatchlist":
        if RESUME_STATE:
            WATCHLIST.append(RESUME_STATE["title"])
            print(f"[Watchlist] Added {RESUME_STATE['title']}")
            speak_response(access_key, f"Added {RESUME_STATE['title']} to your watchlist.")
        else:
            speak_response(access_key, "Sure, but I need to know what to add. Can you tell me the title?")

Complete Python Code for Smart TV Voice Search

This implementation combines all components for a smart TV voice search system:

# Smart TV Voice Search for Content Discovery

import argparse
import json
import os
from collections import deque

import pvporcupine
import pvrhino
import pvcheetah
import picollm
import pvorca
from pvrecorder import PvRecorder
from pvspeaker import PvSpeaker


CONTENT_CATALOG = [
    {"id": "1001", "title": "The Last Frontier", "type": "movie", "genre": ["action", "adventure"],
     "year": 2023, "rating": 8.4, "description": "A retired soldier uncovers a conspiracy that reaches the highest levels of government."},
    {"id": "1002", "title": "Laugh Track", "type": "movie", "genre": ["comedy"],
     "year": 2024, "rating": 7.2, "description": "A stand-up comedian's life is turned upside down when a celebrity endorses their open mic set."},
    {"id": "1003", "title": "Midnight Echoes", "type": "movie", "genre": ["thriller", "mystery"],
     "year": 2022, "rating": 8.1, "description": "A detective traces a series of cryptic messages leading to a decades-old unsolved case."},
    {"id": "1004", "title": "Wild Kingdom", "type": "show", "genre": ["documentary", "family"],
     "year": 2023, "rating": 8.7, "description": "An immersive documentary series exploring ecosystems across six continents."},
    {"id": "1005", "title": "Star Odyssey", "type": "movie", "genre": ["action", "adventure"],
     "year": 2024, "rating": 8.9, "description": "Humanity's first interstellar crew faces impossible odds on a mission to find a new home."},
    {"id": "1006", "title": "Family Circus", "type": "movie", "genre": ["animation", "family", "comedy"],
     "year": 2023, "rating": 7.8, "description": "A chaotic circus family embarks on a road trip that brings them closer together."},
    {"id": "1007", "title": "Dark Horizon", "type": "movie", "genre": ["thriller", "action"],
     "year": 2024, "rating": 7.9, "description": "An international fugitive races against time to clear their name before a global summit."},
    {"id": "1008", "title": "Crimson City", "type": "show", "genre": ["crime", "drama"],
     "year": 2022, "rating": 8.5, "description": "A detective navigates corruption and rivalry in a city where everyone has secrets."},
]

RESUME_STATE = {"id": "1003", "title": "Midnight Echoes", "position_sec": 4230}
WATCHLIST = []

RECOMMENDATION_PROMPT = """You are a smart TV content recommendation assistant.
Given a user's request and a catalog of available content, recommend the most relevant titles.
Respond only in JSON format: {"recommendations": [{"id": "content_id", "reason": "brief reason"}]}
Only recommend content that exists in the provided catalog. Limit to 3 recommendations."""


def speak_response(access_key: str, text: str) -> None:
    """Convert text to speech and play"""
    orca = pvorca.create(access_key=access_key)
    speaker = PvSpeaker(
        sample_rate=orca.sample_rate,
        bits_per_sample=16
    )
    
    try:
        pcm_out, _ = orca.synthesize(text)
        
        speaker.start()
        
        pcm_buffer = deque()
        pcm_buffer.append(pcm_out)
        
        while len(pcm_buffer) > 0:
            pcm = pcm_buffer.popleft()
            written = speaker.write(pcm)
            if written < len(pcm):
                pcm_buffer.appendleft(pcm[written:])
        
        speaker.flush()
        speaker.stop()
    except KeyboardInterrupt:
        print("\nStopping playback...")
        speaker.stop()
    finally:
        speaker.delete()
        orca.delete()


def search_by_genre(genre: str) -> list:
    """Return content matching the requested genre, sorted by rating"""
    results = [c for c in CONTENT_CATALOG if genre.lower() in [g.lower() for g in c["genre"]]]
    results.sort(key=lambda x: x["rating"], reverse=True)
    return results[:3]


def get_top_rated(content_type=None) -> list:
    """Return top-rated content, optionally filtered by type"""
    results = list(CONTENT_CATALOG)
    if content_type:
        results = [c for c in results if c["type"] == content_type]
    results.sort(key=lambda x: x["rating"], reverse=True)
    return results[:3]


def format_content_results(results: list) -> str:
    """Format catalog results into a speakable response"""
    if not results:
        return "I couldn't find anything matching that. Try a different genre or ask me for recommendations."

    if len(results) == 1:
        c = results[0]
        return (f"I found {c['title']}, a {c['genre'][0]} {c['type']} from {c['year']} "
                f"rated {c['rating']} out of 10. {c['description']}")

    titles = [c["title"] for c in results]
    response = f"Here are some options: {', '.join(titles[:-1])}, and {titles[-1]}."
    response += f" {results[0]['title']} is the highest rated at {results[0]['rating']} out of 10."
    return response


def format_recommendations(recommendations: list) -> str:
    """Format picoLLM recommendations into a speakable response"""
    catalog_map = {c["id"]: c for c in CONTENT_CATALOG}
    matched = []
    for rec in recommendations:
        if rec["id"] in catalog_map:
            matched.append({**catalog_map[rec["id"]], "reason": rec.get("reason", "")})

    if not matched:
        return "I couldn't find a good match for that. Try searching by genre or ask for top-rated content."

    if len(matched) == 1:
        c = matched[0]
        return f"Based on what you're looking for, I'd suggest {c['title']}. {c['reason']}"

    titles = [c["title"] for c in matched]
    return f"Here are some picks for you: {', '.join(titles[:-1])}, and {titles[-1]}. {matched[0]['reason']}"


def handle_content_command(access_key: str, intent: str, slots: dict[str, str]) -> None:
    """Execute content search commands and provide voice feedback"""
    
    if intent == "searchByGenre":
        genre = slots.get('genre', '')
        print(f"[Catalog] Searching for {genre} content")
        results = search_by_genre(genre)
        response = format_content_results(results)
        speak_response(access_key, response)
        
    elif intent == "resumeContent":
        if RESUME_STATE:
            print(f"[Playback] Resuming {RESUME_STATE['title']}")
            speak_response(access_key, f"Resuming {RESUME_STATE['title']} from where you left off.")
        else:
            speak_response(access_key, "You don't have anything in progress. Would you like me to suggest something?")
    
    elif intent == "topRated":
        print("[Catalog] Fetching top-rated content")
        results = get_top_rated()
        response = format_content_results(results)
        speak_response(access_key, response)
        
    elif intent == "nowPlaying":
        speak_response(access_key, "Live TV is available on input two. Switch to that input to browse live channels.")
        
    elif intent == "addToWatchlist":
        if RESUME_STATE:
            WATCHLIST.append(RESUME_STATE["title"])
            print(f"[Watchlist] Added {RESUME_STATE['title']}")
            speak_response(access_key, f"Added {RESUME_STATE['title']} to your watchlist.")
        else:
            speak_response(access_key, "Sure, but I need to know what to add. Can you tell me the title?")


def handle_content_search(access_key: str, context_path: str) -> None:
    """Process structured content commands using Rhino Speech-to-Intent"""
    
    try:
        rhino = pvrhino.create(
            access_key=access_key,
            context_path=context_path)
    except pvrhino.RhinoError as e:
        print("Failed to initialize Rhino")
        raise e

    print(f'Rhino version: {rhino.version}')

    recorder = PvRecorder(frame_length=rhino.frame_length)
    recorder.start()

    print('Listening for content command...')

    try:
        is_finalized = False
        while not is_finalized:
            pcm = recorder.read()
            is_finalized = rhino.process(pcm)

        inference = rhino.get_inference()
        if inference.is_understood:
            print('{')
            print(f"  intent : '{inference.intent}'")
            print('  slots : {')
            for slot, value in inference.slots.items():
                print(f"    '{slot}' : '{value}'")
            print('  }')
            print('}\n')

            handle_content_command(access_key, inference.intent, inference.slots)
        else:
            print("Didn't understand the command. Please try again.")

    except KeyboardInterrupt:
        print('\nStopping...')

    finally:
        recorder.stop()
        recorder.delete()
        rhino.delete()


def handle_llm_query(access_key: str, pllm_model_path: str) -> None:
    """Process open-ended content requests using Cheetah + picoLLM"""
    
    # Initialize Cheetah for speech-to-text
    cheetah = pvcheetah.create(
        access_key=access_key,
        endpoint_duration_sec=1.0
    )
    
    recorder = PvRecorder(frame_length=cheetah.frame_length)
    recorder.start()
    
    print("Speak your request...")
    transcript = ""
    
    try:
        is_endpoint = False
        while not is_endpoint:
            pcm = recorder.read()
            partial_transcript, is_endpoint = cheetah.process(pcm)
            transcript += partial_transcript
            print(partial_transcript, end="", flush=True)

        final_transcript = cheetah.flush()
        transcript += final_transcript
        print(final_transcript)

    except KeyboardInterrupt:
        print("\nStopping...")
        
    finally:
        recorder.stop()
        recorder.delete()
        cheetah.delete()
    
    if not transcript.strip():
        print("No speech detected.")
        return
    
    # Process with picoLLM
    pllm = picollm.create(
        access_key=access_key,
        model_path=pllm_model_path
    )
    
    catalog_summary = json.dumps([
        {"id": c["id"], "title": c["title"], "type": c["type"],
         "genre": c["genre"], "year": c["year"], "rating": c["rating"],
         "description": c["description"]}
        for c in CONTENT_CATALOG
    ])
    
    prompt = (f"{RECOMMENDATION_PROMPT}\n\n"
              f"Content catalog:\n{catalog_summary}\n\n"
              f'Viewer request: "{transcript}"\n\n'
              "Respond with valid JSON only:")
    
    print("\nGenerating recommendations...")
    
    response = pllm.generate(
        prompt=prompt,
        completion_token_limit=200
    )
    
    # Parse JSON recommendations from LLM output
    try:
        clean = response.completion.replace("```json", "").replace("```", "").strip()
        result = json.loads(clean)
        recommendations = result.get("recommendations", [])
    except json.JSONDecodeError:
        recommendations = []
    
    spoken = format_recommendations(recommendations)
    print(f"\nAssistant: {spoken}")
    
    speak_response(access_key, spoken)
    
    pllm.release()


def main():
    parser = argparse.ArgumentParser()

    parser.add_argument(
        '--access_key',
        help='AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)',
        required=True)

    parser.add_argument(
        '--command_keyword_path',
        help='Absolute path to content search wake word model file (.ppn)',
        required=True)

    parser.add_argument(
        '--query_keyword_path',
        help='Absolute path to recommendation wake word model file (.ppn)',
        required=True)

    parser.add_argument(
        '--command_es_keyword_path',
        help='Absolute path to Spanish content search wake word model file (.ppn)',
        required=True)

    parser.add_argument(
        '--query_es_keyword_path',
        help='Absolute path to Spanish recommendation wake word model file (.ppn)',
        required=True)

    parser.add_argument(
        '--context_path',
        help='Absolute path to Rhino context file (.rhn)',
        required=True)

    parser.add_argument(
        '--pllm_model_path',
        help='Absolute path to picoLLM model file (.pllm)',
        required=True)

    args = parser.parse_args()

    print("Smart TV Voice Search")
    print("=" * 50)

    # Main loop for continuous operation
    while True:
        # Stage 1: Wake Word Detection with multilingual keywords
        try:
            porcupine = pvporcupine.create(
                access_key=args.access_key,
                keyword_paths=[
                    args.command_keyword_path,
                    args.query_keyword_path,
                    args.command_es_keyword_path,
                    args.query_es_keyword_path])
        except pvporcupine.PorcupineError as e:
            print("Failed to initialize Porcupine")
            raise e

        # Extract keyword names from filenames
        keywords = []
        for keyword_path in [args.command_keyword_path, args.query_keyword_path,
                     args.command_es_keyword_path, args.query_es_keyword_path]:
            keyword_phrase_part = os.path.basename(keyword_path).replace('.ppn', '').split('_')
            if len(keyword_phrase_part) > 6:
                keywords.append(' '.join(keyword_phrase_part[0:-6]))
            else:
                keywords.append(keyword_phrase_part[0])

        print(f'Porcupine version: {porcupine.version}')

        recorder = PvRecorder(frame_length=porcupine.frame_length)
        recorder.start()

        print('Listening for wake word... (press Ctrl+C to exit)')
        print(f'  Say "{keywords[0]}" or "{keywords[2]}" for content search')
        print(f'  Say "{keywords[1]}" or "{keywords[3]}" for AI recommendations')

        detected_keyword_index = -1
        interrupted = False

        try:
            result = -1
            while result < 0:
                pcm = recorder.read()
                result = porcupine.process(pcm)

            print(f'Detected "{keywords[result]}"')
            detected_keyword_index = result

        except KeyboardInterrupt:
            print('\nStopping...')
            interrupted = True

        finally:
            recorder.stop()
            recorder.delete()
            porcupine.delete()

        if interrupted:
            break

        # Stage 2: Route based on detected wake word
        if detected_keyword_index in [0, 2]:
            # Content search wake word - route to Rhino for structured queries
            handle_content_search(args.access_key, args.context_path)
        elif detected_keyword_index in [1, 3]:
            # Recommendation wake word - route to Cheetah + picoLLM
            handle_llm_query(args.access_key, args.pllm_model_path)


if __name__ == '__main__':
    main()

Run the Smart TV Voice Assistant

To run the voice search system, update the model paths to match your local files and have your Picovoice AccessKey ready:

python3 smart_tv_voice_search.py \
  --access_key "$ACCESS_KEY" \
  --command_keyword_path ./models/hey-tv.ppn \
  --query_keyword_path ./models/hey-assistant.ppn \
  --command_es_keyword_path ./models/oye-tv.ppn \
  --query_es_keyword_path ./models/oye-asistente.ppn \
  --context_path ./models/content-discovery.rhn \
  --pllm_model_path ./models/llama-3.2-3b-instruct-505.pllm

Example Interactions

Content Search:

Viewer: "Hey TV, find action movies."
System: "Here are some options: Star Odyssey, The Last Frontier, and Dark Horizon. Star Odyssey is the highest rated at 8.9 out of 10."

Content Search (Spanish Activation):

Viewer: "Oye TV, find action movies."
System: "Here are some options: Star Odyssey, The Last Frontier, and Dark Horizon. Star Odyssey is the highest rated at 8.9 out of 10."

Resume Playback:

Viewer: "Hey TV, continue watching."
System: "Resuming Midnight Echoes from where you left off."

AI Recommendation:

Viewer: "Hey Assistant, something fun for the whole family."
System: "Here are some picks for you: Wild Kingdom, Family Circus, and Star Odyssey. Wild Kingdom is a great match for family viewing."

AI Recommendation (Spanish Activation):

Viewer: "Oye Asistente, something fun for the whole family."
System: "Here are some picks for you: Wild Kingdom, Family Circus, and Star Odyssey. Wild Kingdom is a great match for family viewing."

If you want fully Spanish voice commands (not just Spanish wake words), you can build a Spanish Rhino Speech-to-Intent context and route Spanish commands through the same content search pipeline.

You can start building your own commercial or non-commercial projects using Picovoice's self-service Console.

Start Building

Frequently Asked Questions

Will the voice search work accurately in noisy environments, with different accents, or with varied content titles?

Yes. Porcupine Wake Word, Rhino Speech-to-Intent, and Cheetah Streaming Speech-to-Text are designed to work reliably with background noise and various accents across supported languages.

Can I use different wake words instead of 'Hey TV' and 'Hey Assistant'?

Yes. Train any custom wake phrases using Picovoice Console in seconds without collecting training data. Simply enter your desired phrases and download the trained models. Porcupine detects multiple wake words simultaneously so both activation paths stay responsive. This tutorial demonstrates simultaneous English and Spanish wake words routing to the same content search and recommendation paths. The wake word guide covers best practices for choosing effective wake phrases.

When should I use Rhino Speech-to-Intent versus picoLLM for content queries?

Use Rhino Speech-to-Intent for structured, predictable content searches like genre filters, resume commands, and top-rated lists. Use picoLLM for open-ended requests where viewers might phrase things in unpredictable ways. The dual wake word architecture lets viewers choose the appropriate path upfront. For example, "Hey TV" or "Oye TV" for direct searches, and "Hey Assistant" or "Oye Asistente" for AI recommendations.