MCP Voice Assistant Tutorial: Build a Local AI Assistant in Python

🏢 Enterprise AI Consulting

Get dedicated help specific to your use case and for your hardware and software choices.

TLDR: Learn how to build a local Model Context Protocol (MCP) voice assistant using FastMCP, a local LLM (picoLLM running Meta Llama 3.2) to handle function calling, speech-to-text, text-to-speech, and external API integration in this step-by-step MCP tutorial. Unlike cloud-based solutions like Claude or ChatGPT, this tutorial uses a fully local LLM for privacy and on-device capability.

What is MCP (Model Context Protocol)?

The Model Context Protocol (MCP) is an open-source standard developed by Anthropic that enables seamless integration between AI applications, language models, and external tools. Think of MCP as a universal adapter that lets your AI agent interact with databases, APIs, local files, and other services in a standardized way.

MCP solves a critical problem in AI development: without a standard protocol, every AI application needs custom integrations for each service it wants to access. MCP provides a consistent interface that works across different Large Language Models (LLMs) and tools, making AI applications more portable and maintainable. While MCP works with cloud providers like Claude, OpenAI's GPT models, Gemini, this MCP implementation tutorial uses Picovoice's picoLLM running Llama 3.2 locally.

Three Types of MCP Capabilities

MCP servers can provide three distinct types of capabilities:

Tools: Executable functions that the LLM can call to perform actions (API requests, file operations, calculations)
Resources: Read-only data sources like files, database queries, or API responses that the LLM can reference for context
Prompts: Pre-written templates and workflows that guide the LLM through complex multistep tasks

This local MCP server tutorial focuses on tools—specifically, building function-calling capabilities that let your picoLLM-powered Llama 3.2 model interact with external APIs through MCP.

Why Build a Local MCP Voice Agent?

Most MCP tutorials rely on cloud-hosted models like Claude or ChatGPT. While convenient for prototyping, cloud-based solutions have significant drawbacks for voice applications:

Latency: Round-trip API calls add 500-2000ms delays, creating awkward pauses in conversation
Privacy: Voice data and queries are sent to remote servers
Network dependency: Requires stable internet connection

A local MCP voice agent eliminates these issues. By running the LLM, speech recognition, and synthesis entirely on-device, you get:

Much faster and more reliable response times for natural conversation flow
Complete privacy—no voice or query data leaves your machine
Offline functionality: (except for explicit external API calls and access key validation)

Python Voice Assistant Tutorial: What You'll Build Step-by-Step

In this tutorial, you'll create a local MCP AI voice assistant that:

Listens to your speech using streaming speech-to-text
Processes queries using a local LLM
Calls weather API tools through MCP (FastMCP) based on your intent
Responds with natural speech using text-to-speech

The assistant will understand conversational queries like "What's the weather like in San Francisco?" and respond naturally with current conditions and forecasts.

Local AI Voice Assistant: Architecture Overview

Here's how the components work together:

User speaks → Captured via PvRecorder + Cheetah Streaming Speech-to-Text
MCP client sends query + available tools to local LLM
Local LLM (picoLLM - Meta Llama 3.2, quantized) analyzes query and selects appropriate MCP tool
MCP server executes tool (e.g., fetch_weather) via external weather API and returns structured data
Local LLM formats raw data into conversational response
Text-to-Speech engine (Orca Streaming Text-to-Speech) converts response to speech → played via PvSpeaker

The entire process runs locally except for the weather API call itself, ensuring privacy and low latency.

Prerequisites

Before starting, verify you have:

Python 3.10 or higher installed
Linux, macOS, or Windows operating system
Microphone and speakers for voice interaction
Internet connection for initial setup and weather API calls

Estimated time: 45-60 minutes including setup

Step 1: Set Up Your Python Environment for MCP Development

Create an isolated Python environment for the project:

# Create project directory
mkdir mcp-voice-assistant
cd mcp-voice-assistant

# Create and activate virtual environment
# macOS/Linux:
python3 -m venv .venv
source .venv/bin/activate

# Windows:
python -m venv .venv
.venv\Scripts\activate

# Install all dependencies
pip install mcp python-dotenv requests picollm pvcheetah pvorca pvrecorder pvspeaker

# Create project files
# macOS/Linux:
touch .env server.py client.py

# Windows:
New-Item .env -ItemType File
New-Item server.py -ItemType File
New-Item client.py -ItemType File

Note: The Python MCP SDK must be version 1.2.0 or higher.

Step 2: Build the MCP Server in Python

An MCP server acts as a bridge between your LLM and external services. The server exposes well-defined tools that the LLM can invoke with structured parameters.

Configure API Keys

WEATHER_API_KEY="${YOUR_WEATHER_API_KEY}"

Note: Keep your .env file out of version control to protect API keys.

Initialize the MCP Server

Create server.py and set up the basic server structure:

import os
import requests
import json

from dotenv import load_dotenv
from mcp.server.fastmcp import FastMCP

# Load environment variables
load_dotenv()
api_key = os.getenv("WEATHER_API_KEY")
base_url = "https://api.weatherapi.com/v1"

# Initialize FastMCP server
server = FastMCP("Weather Voice Assistant")

Key points:

FastMCP provides a lightweight MCP server implementation optimized for local use
The server name ("Weather Voice Assistant") appears in logs and helps with debugging
Environment variables keep sensitive credentials separate from code

Helper Function for Temperature Units

Add this utility function before defining tools:

def is_use_fahrenheit(country: str) -> bool:
    """Determine if country uses Fahrenheit for temperature"""
    fahrenheit_countries = [
        "United States of America",
        "Bahamas",
        "Cayman Islands",
        "Palau",
        "Micronesia",
        # ...
    ]
    return country in fahrenheit_countries

This ensures weather responses use the appropriate temperature scale based on location.

Define MCP Tools

Tools are the core of your MCP server. Each tool must have:

Clear function signature with type hints
Docstring explaining purpose and parameters (the LLM reads these!)
Structured return format

Tool 1: Current Weather

@server.tool()
def fetch_weather(city_name: str) -> dict:
    """
    Retrieves the current weather conditions for a given city.

    Args:
        city_name: Name of the city (e.g., "Vancouver", "San Francisco")

    Returns:
        A dictionary containing the current weather conditions data.
    """
    try:
        response = requests.get(f"{base_url}/current.json?key={api_key}&q={city_name}")
        weather_data = response.json()

        # Extract and format current weather data
        location = weather_data["location"]["name"]
        region = weather_data["location"]["region"]
        country = weather_data["location"]["country"]
        temp_f = str(weather_data["current"]["temp_f"]) + "°F"
        temp_c = str(weather_data["current"]["temp_c"]) + "°C"
        temp = temp_f if is_use_fahrenheit(country) else temp_c
        condition = weather_data["current"]["condition"]["text"]

        result = f"Weather in {location}, {region}, {country}: Temperature: {temp}, Condition: {condition}"

        if response.status_code == 200:
            return {"success": True, "contents": result}
        else:
            return {
                "success": False,
                "error": f"API request failed with status code: {response.status_code}",
            }
    except Exception as e:
        return {"success": False, "error": str(e)}

Important implementation details:

The docstring is crucial—it tells the LLM when to use this tool
Return format must be consistent: success boolean + contents or error
Error handling prevents server crashes when API calls fail

Tool 2: Weather Forecast

@server.tool()
def fetch_forecast(city_name: str, days: int) -> dict:
    """
    Retrieves the weather forecast for a given city over a specified number of days.

    Args:
        city_name: Name of the city (e.g., "Vancouver", "San Francisco")
        days: Number of days to retrieve the forecast for (maximum 14).

    Returns:
        A dictionary containing the weather forecast data.
    """
    try:
        response = requests.get(
            f"{base_url}/forecast.json?key={api_key}&q={city_name}&days={days}"
        )
        weather_data = response.json()

        # Extract and format weather forecast data
        location = weather_data["location"]["name"]
        region = weather_data["location"]["region"]
        country = weather_data["location"]["country"]

        days = [
            {
                "date": day["date"],
                "avgtemp": (
                    f"{day['day']['avgtemp_f']}°F"
                    if is_use_fahrenheit(country)
                    else f"{day['day']['avgtemp_c']}°C"
                ),
                "condition": day["day"]["condition"]["text"],
            }
            for day in weather_data["forecast"]["forecastday"]
        ]

        result = f"Forecast for {location}, {region}, {country}: " + " ".join(
            f"On {day['date']}, the average temperature is {day['avgtemp']} and the condition is {day['condition']}."
            for day in days
        )

        if response.status_code == 200:
            return {"success": True, "contents": result}
        else:
            return {
                "success": False,
                "error": f"API request failed with status code: {response.status_code}",
            }
    except Exception as e:
        return {"success": False, "error": str(e)}

Design considerations:

The days parameter lets the LLM request flexible forecast ranges (maximum 14 days)
Pre-formatting the response reduces LLM hallucination on numerical data

Start the MCP Server

Add the entry point at the end of server.py:

def main():
    server.run(transport="stdio")


if __name__ == "__main__":
    main()

The server is now complete but won't do anything on its own. In the next section, we'll build the client that orchestrates the LLM, MCP server, and voice interaction.

Step 3: Build the MCP Client with a Local LLM: Integrate Llama 3.2

The MCP client is the orchestration layer. It manages the connection to the MCP server, handles user input (voice or text), coordinates with the local LLM for intent recognition, and executes tool calls.

Get Picovoice Credentials

PICOVOICE_ACCESS_KEY="${YOUR_PICOVOICE_ACCESS_KEY}"

Download a Local LLM

Download a function-calling compatible model from the picoLLM page. This tutorial uses Llama 3.2 3B Instruct (llama-3.2-3b-instruct-505.pllm).

Model requirements: The LLM must support function calling.

Place the .pllm file in your project directory (e.g., ./models/llama-3.2-3b-instruct-505.pllm).

Set Up the Client Structure

Create client.py with necessary imports:

import argparse
import asyncio
import ast
import os
import re
from collections import deque
from contextlib import AsyncExitStack
from typing import Optional

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

from dotenv import load_dotenv

import picollm
import pvcheetah
import pvorca
from pvrecorder import PvRecorder
from pvspeaker import PvSpeaker

# Load environment variables
load_dotenv()
access_key = os.getenv("PICOVOICE_ACCESS_KEY")

Import breakdown:

mcp.*: Core MCP client libraries for server communication
picollm: Local LLM inference engine
pvcheetah, pvorca, PvRecorder, PvSpeaker: Voice I/O components
AsyncExitStack: Manages cleanup of async resources

Initialize the Client Class in Python

class MCPClient:
    def __init__(self, pllm_model_path):
        # MCP session and async cleanup stack
        self.session: Optional[ClientSession] = None
        self.exit_stack = AsyncExitStack()
        
        # Local LLM
        self.pllm = picollm.create(access_key=access_key, model_path=pllm_model_path)
        
        # Initialize streaming STT engine
        self.cheetah = pvcheetah.create(
            access_key=access_key,
            endpoint_duration_sec=2.0,
            enable_automatic_punctuation=True,
        )
        
        # Recorder for capturing microphone input
        self.recorder = PvRecorder(frame_length=self.cheetah.frame_length)
        
        # Initialize streaming TTS engine
        self.orca = pvorca.create(access_key=access_key)
        
        # Speaker for audio playback
        self.speaker = PvSpeaker(
            sample_rate=self.orca.sample_rate,
            bits_per_sample=16,
        )

Connect to the MCP Server

async def connect_to_server(self, server_script_path: str):
    """Connect to an MCP server"""
    server_params = StdioServerParameters(
        command="python3", args=[server_script_path], env=None
    )

    stdio_transport = await self.exit_stack.enter_async_context(
        stdio_client(server_params)
    )
    self.stdio, self.write = stdio_transport
    self.session = await self.exit_stack.enter_async_context(
        ClientSession(self.stdio, self.write)
    )

    await self.session.initialize()

    # List available tools
    response = await self.session.list_tools()
    tools = response.tools
    print("\nConnected to server with tools:", [tool.name for tool in tools])

How it works:

Spawns the MCP server as a subprocess using python3 and the given script
Sets up communication channels (stdin/stdout) with the server
Initializes an MCP client session over that connection
Queries the server for available tools
Prints the tool names to confirm the connection

Troubleshooting: If connection fails, verify server_script_path is correct and the server has no syntax errors.

Add Function Call Parsing

The LLM returns function calls as strings. We need to parse them:

# Add this helper function before the MCPClient class
def parse_args(args_str: str) -> dict:
    """Parse function arguments from string format"""
    try:
        # Handle simple key=value format
        args_dict = {}
        pairs = args_str.split(',')
        for pair in pairs:
            key, value = pair.split('=')
            key = key.strip()
            value = value.strip().strip('"').strip("'")
            # Try to convert to appropriate type
            try:
                value = ast.literal_eval(value)
            except (ValueError, SyntaxError):
                pass
            args_dict[key] = value
        return args_dict
    except Exception as e:
        print(f"Error parsing args: {e}")
        return {}

# Add this regex pattern at module level
pattern = r"""
    (?P<func>[a-zA-Z_][a-zA-Z0-9_]*)   # function name
    \s*\(\s*                           # opening parenthesis
    (?P<args>[^)]*)                    # arguments
    \s*\)                              # closing parenthesis
"""

Why this is needed:

The LLM outputs function calls as text, not executable code
We extract function name and parameters using regex
Arguments are parsed into a dictionary for MCP tool invocation

If you switch models or adjust the prompt context, the LLM may emit function calls in a slightly different format, so you may need to adapt this configuration accordingly.

Process Queries with Tool Calling

This is the core logic that orchestrates LLM and MCP:

async def process_query(self, query: str) -> str:
    """Process a query using picoLLM and available tools"""

    response = await self.session.list_tools()
    available_tools = [
        {
            "name": tool.name,
            "description": tool.description,
            "input_schema": tool.inputSchema,
        }
        for tool in response.tools
    ]

    prompt = """
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Questions: {query}
Here is a list of functions in JSON format that you can invoke:
{available_tools}
Should you decide to return the function call(s), Put it in the format of: [func1(params_name=params_value, params_name2=params_value2...), func2(params)]
NO other text MUST be included.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    """.format(
        query=query, available_tools=available_tools
    )

    dialog = self.pllm.get_dialog(history=1)
    dialog.add_human_request(prompt)

    # Initial picoLLM API call
    response = self.pllm.generate(
        prompt=dialog.prompt(),
        completion_token_limit=1000,
        presence_penalty=0.0,
        frequency_penalty=0.0,
        temperature=0.0,
        top_p=1.0,
    )

    match = re.search(pattern, response.completion, re.VERBOSE)

    if match:
        func_name = match.group("func")
        args = match.group("args")

        # Parse arguments and call tool
        args_dict = parse_args(args)
        result = await self.session.call_tool(func_name, args_dict)
        dialog.add_llm_response(response.completion)

        execution_result = result.content

        followup_prompt = f"""
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
The user requested weather information, and your app executed it.
Here is the result of that execution:
{execution_result}

Please summarize this result for the user in a conversational, easy-to-understand message.
Only return the text, no function calls.
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
        dialog.add_human_request(followup_prompt)

        response = self.pllm.generate(
            prompt=dialog.prompt(),
            completion_token_limit=1000,
            presence_penalty=0.0,
            frequency_penalty=0.0,
            temperature=0.0,
            top_p=1.0,
        )
        return response.completion.removesuffix("<|eot_id|>")

    else:
        return f"No tools called. LLM response: {response.completion}"

Implementation details:

Llama 3.2 prompt format: The <|begin_of_text|> tags are specific to Llama 3.2's expected format
Two-phase LLM calls:
- First call: Determine which tool to use
- Second call: Format raw tool output into natural language
Temperature=0.0: Ensures deterministic tool selection (no randomness)
Dialog history: Maintains context between the two LLM calls

Model-specific note: If using a different LLM, adjust the prompt template to match its expected format (check model documentation).

Build a Voice Chat Loop: Add Real-Time Speech Recognition

async def chat_loop(self):
    """Interactive voice chat loop"""
    print("\nMCP Voice Assistant Started!")
    print("Speak your queries or type 'stop' to exit.")

    while True:
        try:
            # Ask user for text input first
            action = input("\n> Press 'Enter' to speak, type 'stop' to exit: ").strip()
            if action.lower() == "stop":
                print("Exiting...")
                break

            print("\nAsk MCP Assistant: ", end="")
            query = ""

            # Enable the user to provide their query by speaking
            self.recorder.start()
            while True:
                frame = self.recorder.read()
                partial_transcript, is_endpoint = self.cheetah.process(frame)
                query += partial_transcript
                print(partial_transcript, end="", flush=True)

                if is_endpoint:
                    final_transcript = self.cheetah.flush()
                    query += final_transcript
                    print(final_transcript + "\n")
                    break
            self.recorder.stop()

            # Process query through MCP + LLM
            response = await self.process_query(query)
            print(f"\nMCP Assistant: {response}")

            # Speak the response back to the user
            self.speaker.start()
            response_pcm, _ = self.orca.synthesize(response)

            if response_pcm is not None:
                pcm_buffer = deque()
                pcm_buffer.append(response_pcm)

                while len(pcm_buffer) > 0:
                    pcm = pcm_buffer.popleft()
                    written = self.speaker.write(pcm)
                    if written < len(pcm):
                        pcm_buffer.appendleft(pcm[written:])
                self.speaker.flush()

            self.speaker.stop()

        except KeyboardInterrupt:
            print("\nInterrupted by user")
            break
        except Exception as e:
            print(f"\nError: {str(e)}")

Voice interaction flow:

Microphone captures audio frames continuously
Cheetah Streaming Speech-to-Text processes frames and outputs transcribed speech in real-time
When an endpoint is detected (2 seconds of silence), the query is finalized
Query is processed through MCP + LLM
Response is synthesized to audio by Orca Streaming Text-to-Speech and played through speaker

Tip: adjust Cheetah's endpoint duration during initialization if longer pauses are anticipated during formation of a query.

User experience features:

Real-time transcript display shows what the system hears
Type 'stop' to exit gracefully
Both text and voice output for clarity

Clean Up Resources

async def cleanup(self):
    """Release all resources for STT, TTS, and LLM"""
    await self.exit_stack.aclose()
    self.recorder.delete()   # Release microphone
    self.cheetah.delete()    # Release STT engine
    self.speaker.delete()    # Release speaker
    self.orca.delete()       # Release TTS engine
    self.pllm.release()      # Release LLM

Add Main Entry Point

async def main():
    parser = argparse.ArgumentParser(description="MCP Weather Voice Assistant Client")
    parser.add_argument(
        "-s",
        "--server_script",
        help="Path to the MCP server script",
        required=True,
    )
    parser.add_argument(
        "-m",
        "--pllm_model_path",
        help="Absolute path to the file containing LLM parameters.",
        required=True,
    )

    args = parser.parse_args()

    client = MCPClient(pllm_model_path=args.pllm_model_path)
    try:
        await client.connect_to_server(args.server_script)
        await client.chat_loop()
    finally:
        await client.cleanup()


if __name__ == "__main__":
    asyncio.run(main())

Complete Code: MCP Voice Assistant in Python

Here are the complete files for client.py and server.py:

client.py

import argparse
import asyncio
import ast
import os
import re
from collections import deque
from contextlib import AsyncExitStack
from typing import Optional

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

from dotenv import load_dotenv

import picollm
import pvcheetah
import pvorca
from pvrecorder import PvRecorder
from pvspeaker import PvSpeaker

# Load environment variables
load_dotenv()
access_key = os.getenv("PICOVOICE_ACCESS_KEY")


# Add this helper function before the MCPClient class
def parse_args(args_str: str) -> dict:
    """Parse function arguments from string format"""
    try:
        # Handle simple key=value format
        args_dict = {}
        pairs = args_str.split(",")
        for pair in pairs:
            key, value = pair.split("=")
            key = key.strip()
            value = value.strip().strip('"').strip("'")
            # Try to convert to appropriate type
            try:
                value = ast.literal_eval(value)
            except (ValueError, SyntaxError):
                pass
            args_dict[key] = value
        return args_dict
    except Exception as e:
        print(f"Error parsing args: {e}")
        return {}


# Add this regex pattern at module level
pattern = r"""
    (?P<func>[a-zA-Z_][a-zA-Z0-9_]*)   # function name
    \s*\(\s*                           # opening parenthesis
    (?P<args>[^)]*)                    # arguments
    \s*\)                              # closing parenthesis
"""


class MCPClient:
    def __init__(self, pllm_model_path):
        # MCP session and async cleanup stack
        self.session: Optional[ClientSession] = None
        self.exit_stack = AsyncExitStack()

        # Local LLM
        self.pllm = picollm.create(access_key=access_key, model_path=pllm_model_path)

        # Initialize streaming STT engine
        self.cheetah = pvcheetah.create(
            access_key=access_key,
            endpoint_duration_sec=2.0,
            enable_automatic_punctuation=True,
        )

        # Recorder for capturing microphone input
        self.recorder = PvRecorder(frame_length=self.cheetah.frame_length)

        # Initialize streaming TTS engine
        self.orca = pvorca.create(access_key=access_key)

        # Speaker for audio playback
        self.speaker = PvSpeaker(
            sample_rate=self.orca.sample_rate,
            bits_per_sample=16,
        )

    async def connect_to_server(self, server_script_path: str):
        """Connect to an MCP server"""
        server_params = StdioServerParameters(
            command="python3", args=[server_script_path], env=None
        )

        stdio_transport = await self.exit_stack.enter_async_context(
            stdio_client(server_params)
        )
        self.stdio, self.write = stdio_transport
        self.session = await self.exit_stack.enter_async_context(
            ClientSession(self.stdio, self.write)
        )

        await self.session.initialize()

        # List available tools
        response = await self.session.list_tools()
        tools = response.tools
        print("\nConnected to server with tools:", [tool.name for tool in tools])

    async def process_query(self, query: str) -> str:
        """Process a query using picoLLM and available tools"""

        response = await self.session.list_tools()
        available_tools = [
            {
                "name": tool.name,
                "description": tool.description,
                "input_schema": tool.inputSchema,
            }
            for tool in response.tools
        ]

        prompt = """
    <|begin_of_text|><|start_header_id|>user<|end_header_id|>
    Questions: {query}
    Here is a list of functions in JSON format that you can invoke:
    {available_tools}
    Should you decide to return the function call(s), Put it in the format of: [func1(params_name=params_value, params_name2=params_value2...), func2(params)]
    NO other text MUST be included.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
        """.format(
            query=query, available_tools=available_tools
        )

        dialog = self.pllm.get_dialog(history=1)
        dialog.add_human_request(prompt)

        # Initial picoLLM API call
        response = self.pllm.generate(
            prompt=dialog.prompt(),
            completion_token_limit=1000,
            presence_penalty=0.0,
            frequency_penalty=0.0,
            temperature=0.0,
            top_p=1.0,
        )

        match = re.search(pattern, response.completion, re.VERBOSE)

        if match:
            func_name = match.group("func")
            args = match.group("args")

            # Parse arguments and call tool
            args_dict = parse_args(args)
            result = await self.session.call_tool(func_name, args_dict)
            dialog.add_llm_response(response.completion)

            execution_result = result.content

            followup_prompt = f"""
    <|begin_of_text|><|start_header_id|>user<|end_header_id|>
    The user requested weather information, and your app executed it.
    Here is the result of that execution:
    {execution_result}

    Please summarize this result for the user in a conversational, easy-to-understand message.
    Only return the text, no function calls.
    <|eot_id|><|start_header_id|>assistant<|end_header_id|>
    """
            dialog.add_human_request(followup_prompt)

            response = self.pllm.generate(
                prompt=dialog.prompt(),
                completion_token_limit=1000,
                presence_penalty=0.0,
                frequency_penalty=0.0,
                temperature=0.0,
                top_p=1.0,
            )
            return response.completion.removesuffix("<|eot_id|>")

        else:
            return f"No tools called. LLM response: {response.completion}"

    async def chat_loop(self):
        """Interactive voice chat loop"""
        print("\nMCP Voice Assistant Started!")
        print("Speak your queries or type 'stop' to exit.")

        while True:
            try:
                # Ask user for text input first
                action = input(
                    "\n> Press 'Enter' to speak, type 'stop' to exit: "
                ).strip()
                if action.lower() == "stop":
                    print("Exiting...")
                    break

                print("\nAsk MCP Assistant: ", end="")
                query = ""

                # Enable the user to provide their query by speaking
                self.recorder.start()
                while True:
                    frame = self.recorder.read()
                    partial_transcript, is_endpoint = self.cheetah.process(frame)
                    query += partial_transcript
                    print(partial_transcript, end="", flush=True)

                    if is_endpoint:
                        final_transcript = self.cheetah.flush()
                        query += final_transcript
                        print(final_transcript + "\n")
                        break
                self.recorder.stop()

                # Process query through MCP + LLM
                response = await self.process_query(query)
                print(f"\nMCP Assistant: {response}")

                # Speak the response back to the user
                self.speaker.start()
                response_pcm, _ = self.orca.synthesize(response)

                if response_pcm is not None:
                    pcm_buffer = deque()
                    pcm_buffer.append(response_pcm)

                    while len(pcm_buffer) > 0:
                        pcm = pcm_buffer.popleft()
                        written = self.speaker.write(pcm)
                        if written < len(pcm):
                            pcm_buffer.appendleft(pcm[written:])
                    self.speaker.flush()

                self.speaker.stop()

            except KeyboardInterrupt:
                print("\nInterrupted by user")
                break
            except Exception as e:
                print(f"\nError: {str(e)}")

    async def cleanup(self):
        """Release all resources for STT, TTS, and LLM"""
        await self.exit_stack.aclose()
        self.recorder.delete()  # Release microphone
        self.cheetah.delete()  # Release STT engine
        self.speaker.delete()  # Release speaker
        self.orca.delete()  # Release TTS engine
        self.pllm.release()  # Release LLM


async def main():
    parser = argparse.ArgumentParser(description="MCP Weather Voice Assistant Client")
    parser.add_argument(
        "-s",
        "--server_script",
        help="Path to the MCP server script",
        required=True,
    )
    parser.add_argument(
        "-m",
        "--pllm_model_path",
        help="Absolute path to the file containing LLM parameters.",
        required=True,
    )

    args = parser.parse_args()

    client = MCPClient(pllm_model_path=args.pllm_model_path)
    try:
        await client.connect_to_server(args.server_script)
        await client.chat_loop()
    finally:
        await client.cleanup()


if __name__ == "__main__":
    asyncio.run(main())

server.py

import os
import requests
import json

from dotenv import load_dotenv
from mcp.server.fastmcp import FastMCP

# Load environment variables
load_dotenv()
api_key = os.getenv("WEATHER_API_KEY")
base_url = "https://api.weatherapi.com/v1"

# Initialize FastMCP server
server = FastMCP("Weather Voice Assistant")


def is_use_fahrenheit(country: str) -> bool:
    """Determine if country uses Fahrenheit for temperature"""
    fahrenheit_countries = [
        "United States of America",
        "Bahamas",
        "Cayman Islands",
        "Palau",
        "Micronesia",
        # ...
    ]
    return country in fahrenheit_countries


@server.tool()
def fetch_weather(city_name: str) -> dict:
    """
    Retrieves the current weather conditions for a given city.

    Args:
        city_name: Name of the city (e.g., "Vancouver", "San Francisco")

    Returns:
        A dictionary containing the current weather conditions data.
    """
    try:
        response = requests.get(f"{base_url}/current.json?key={api_key}&q={city_name}")
        weather_data = response.json()

        # Extract and format current weather data
        location = weather_data["location"]["name"]
        region = weather_data["location"]["region"]
        country = weather_data["location"]["country"]
        temp_f = str(weather_data["current"]["temp_f"]) + "°F"
        temp_c = str(weather_data["current"]["temp_c"]) + "°C"
        temp = temp_f if is_use_fahrenheit(country) else temp_c
        condition = weather_data["current"]["condition"]["text"]

        result = f"Weather in {location}, {region}, {country}: Temperature: {temp}, Condition: {condition}"

        if response.status_code == 200:
            return {"success": True, "contents": result}
        else:
            return {
                "success": False,
                "error": f"API request failed with status code: {response.status_code}",
            }
    except Exception as e:
        return {"success": False, "error": str(e)}


@server.tool()
def fetch_forecast(city_name: str, days: int) -> dict:
    """
    Retrieves the weather forecast for a given city over a specified number of days.

    Args:
        city_name: Name of the city (e.g., "Vancouver", "San Francisco")
        days: Number of days to retrieve the forecast for (maximum 14).

    Returns:
        A dictionary containing the weather forecast data.
    """
    try:
        response = requests.get(
            f"{base_url}/forecast.json?key={api_key}&q={city_name}&days={days}"
        )
        weather_data = response.json()

        # Extract and format weather forecast data
        location = weather_data["location"]["name"]
        region = weather_data["location"]["region"]
        country = weather_data["location"]["country"]

        days = [
            {
                "date": day["date"],
                "avgtemp": (
                    f"{day['day']['avgtemp_f']}°F"
                    if is_use_fahrenheit(country)
                    else f"{day['day']['avgtemp_c']}°C"
                ),
                "condition": day["day"]["condition"]["text"],
            }
            for day in weather_data["forecast"]["forecastday"]
        ]

        result = f"Forecast for {location}, {region}, {country}: " + " ".join(
            f"On {day['date']}, the average temperature is {day['avgtemp']} and the condition is {day['condition']}."
            for day in days
        )

        if response.status_code == 200:
            return {"success": True, "contents": result}
        else:
            return {
                "success": False,
                "error": f"API request failed with status code: {response.status_code}",
            }
    except Exception as e:
        return {"success": False, "error": str(e)}


def main():
    server.run(transport="stdio")


if __name__ == "__main__":
    main()

How to Run Your Local MCP Voice Assistant

Start the client (which will automatically start the server):

python client.py -s server.py -m ./models/llama-3.2-3b-instruct-505.pllm

Example interaction:

MCP Voice Assistant example output in the terminal

Extend Your MCP Voice Assistant: Advanced Features and Integrations

Once your local MCP voice assistant is running, there are several ways to extend and improve it:

1. Add More Tools

Expand the assistant's capabilities by creating new @server.tool() functions. For example:

Calendar or task management integration
Local file search and retrieval
Smart home control (lights, thermostat, etc.)

2. Improve LLM Interaction

Experiment with different prompt templates or dialog strategies to improve response quality.
Add multi-turn conversation memory to maintain context over longer interactions.
Try other function-calling compatible models to balance speed, accuracy, and resource usage.

3. Enhance Voice Experience

Implement keyword detection with Porcupine Wake Word so the assistant can run completely hands-free.
Support multiple voices or languages with Orca Streaming Text-to-Speech.

4. Build a GUI or Web Interface

Create a local dashboard to display conversation history and tool outputs.
Visualize responses, forecasts, or other tool data in charts or tables.
Offer text input as an alternative to voice for accessibility.

Start Building

Troubleshooting

The client cannot connect to the MCP server

Confirm the -s / --server_script path points to the correct server.py file.
Look for error messages printed before the connection attempt; server startup errors will prevent the connection.

No tools are listed after connecting

Check that your tool functions are decorated with @server.tool().
Make sure server.run(transport="stdio") is called in server.py.
Restart the application after making changes.

The LLM does not call any tools

Make sure the model you downloaded supports function calling.
Verify the prompt format matches the expected format for your model.
Ensure your query is clear enough so the LLM can decide what tool to use (e.g., "What is the current weather in Vancouver?").

Weather requests fail or return errors

Confirm your WeatherAPI key is valid and has not exceeded its free-tier limits.
Check your internet connection, as weather data is fetched from an external API.
Try a well-known city name to rule out location parsing issues.

Frequently Asked Questions

Do I need an internet connection for this project?

An internet connection is required for initial setup and for weather API requests. All speech processing, LLM inference, and tool selection run locally.

Can I use a different local LLM to build an MCP voice assistant?

Yes. Any local model that supports function calling can work. You may need to adjust the prompt format and function call parsing to match the model's output.

Is MCP required if I already know which functions to call?

MCP provides a standardized way to expose and invoke tools across different models and applications. It reduces custom glue code and makes your assistant more portable as you change models or tools.

Can I add more tools besides weather?

Absolutely. You can add additional '@server.tool()' functions for other APIs or local actions, such as file access, reminders, or system commands.

Complete MCP Tutorial: How to Build a Local MCP Voice Assistant in Python