Mistral AI released Voxtral, their open-source speech understanding model that powers voice mode in Le Chat assistant. Voxtral is available through Mistral's API for cloud-based transcription and as downloadable models for on-device deployment. However, building a production-ready Mistral voice assistant requires additional components beyond what Voxtral provides: wake word detection for hands-free voice activation, real-time speech-to-text that processes audio as users speak, and text-to-speech synthesis for natural voice responses.
Additionally, self-hosting Voxtral for real-time voice mode can require GPU infrastructure and machine learning integration expertise, while Mistral's transcription API still requires separate wake word and voice output solutions. These integration challenges create barriers for developers who want to build voice chat applications with Mistral AI.
This tutorial demonstrates how to build a complete Mistral AI voice assistant with voice mode capabilities in Python. By combining Porcupine Wake Word for custom wake word detection, Cheetah Streaming Speech-to-Text for low-latency speech recognition, and Orca Streaming Text-to-Speech for natural voice responses, developers can create conversational AI voice interfaces that run locally with low latency. No GPU infrastructure or ML expertise required.
What You'll Build:
A hands-free Mistral voice assistant that:
- Activates using a custom wake word
- Transcribes speech in real-time locally
- Sends recognized text to Mistral AI for intelligent responses
- Speaks Mistral's response using local text-to-speech
This architecture supports building multilingual voicebots, AI-powered voice agents, and interactive voice applications for enterprise and consumer use cases.
What You'll Need:
- Python 3.9+
- Microphone and speakers
- Picovoice
AccessKeyfrom the Picovoice Console - Mistral API key from the Mistral AI Platform
Looking to integrate voice with other AI chatbots? See our guides for ChatGPT Voice Assistant and DeepSeek Voice Assistant.
Train a Custom Wake Word for Mistral Activation
- Sign up for a Picovoice Console account and navigate to the Porcupine page.
- Enter your wake phrase such as "Hey Chatbot" and test it using the microphone button.
- Click "Train", select the target platform, and download the
.ppnmodel file. - Repeat steps 2 & 3 for any additional wake words you would like to support (e.g., "Hey Assistant").
Porcupine can detect multiple wake words with no added runtime footprint. For instance, use "Hey Chatbot" and "Dis le chat" ("Dee luh shah") simultaneously to activate the Mistral voice assistant. For tips on designing an effective wake word, review the choosing a wake word guide.
Set Up the Python Environment
Install all required Python SDKs and dependencies with a single command in the terminal:
- Porcupine Wake Word Python SDK:
pvporcupine - Cheetah Streaming Speech-to-Text Python SDK:
pvcheetah - Orca Streaming Text-to-Speech Python SDK:
pvorca - Picovoice Python Recorder library:
pvrecorder - Picovoice Python Speaker library:
pvspeaker - Mistral AI Python SDK:
mistralai— used for sending API calls to Mistral
Implement Wake Word Detection
The following code captures audio from your default microphone and detects the custom wake word locally:
Add Real-Time Speech-to-Text Transcription
After wake word detection, capture audio frames and transcribe them in real-time with Cheetah Streaming Speech-to-Text:
Once you make a natural pause in your speech, such as after asking a question, Cheetah detects it as an endpoint, signaling that you've finished speaking.
Send Transcribed Text to Mistral AI API
Send the transcribed text to Mistral AI using the chat completions endpoint:
Convert Mistral AI's Response to Speech
Transform Mistral's text response into natural speech using Orca Streaming Text-to-Speech and PvSpeaker:
In this example, Orca performs single synthesis because the Mistral API returns the full response at once. When used with streaming models, Orca Streaming Text-to-Speech can generate and play audio in real time through streaming synthesis, enabling significantly lower latency than cloud-based alternatives.
Full Python Code for Mistral Voice Assistant
This complete implementation integrates wake word detection, streaming speech-to-text, Mistral API calls, and text-to-speech synthesis:
Run the Mistral Voice Assistant
To run the voice assistant, update the model path to match your local file and have both API keys ready:
- Picovoice AccessKey (copy from the Picovoice Console)
- Mistral API key (copy from Mistral AI Platform)
You can start building your own commercial or non-commercial projects using Picovoice's self-service Console.
Start Building






