🚀 On-device Voice AI & LLMs
Build commercial, non-commercial, research projects using the Forever-Free Plan.
Start Free

Mistral AI released Voxtral, their open-source speech understanding model that powers voice mode in Le Chat assistant. Voxtral is available through Mistral's API for cloud-based transcription and as downloadable models for on-device deployment. However, building a production-ready Mistral voice assistant requires additional components beyond what Voxtral provides: wake word detection for hands-free voice activation, real-time speech-to-text that processes audio as users speak, and text-to-speech synthesis for natural voice responses.

Additionally, self-hosting Voxtral for real-time voice mode can require GPU infrastructure and machine learning integration expertise, while Mistral's transcription API still requires separate wake word and voice output solutions. These integration challenges create barriers for developers who want to build voice chat applications with Mistral AI.

This tutorial demonstrates how to build a complete Mistral AI voice assistant with voice mode capabilities in Python. By combining Porcupine Wake Word for custom wake word detection, Cheetah Streaming Speech-to-Text for low-latency speech recognition, and Orca Streaming Text-to-Speech for natural voice responses, developers can create conversational AI voice interfaces that run locally with low latency. No GPU infrastructure or ML expertise required.

What You'll Build:

A hands-free Mistral voice assistant that:

  • Activates using a custom wake word
  • Transcribes speech in real-time locally
  • Sends recognized text to Mistral AI for intelligent responses
  • Speaks Mistral's response using local text-to-speech

This architecture supports building multilingual voicebots, AI-powered voice agents, and interactive voice applications for enterprise and consumer use cases.

What You'll Need:

Looking to integrate voice with other AI chatbots? See our guides for ChatGPT Voice Assistant and DeepSeek Voice Assistant.

Train a Custom Wake Word for Mistral Activation

  1. Sign up for a Picovoice Console account and navigate to the Porcupine page.
  2. Enter your wake phrase such as "Hey Chatbot" and test it using the microphone button.
  3. Click "Train", select the target platform, and download the .ppn model file.
  4. Repeat steps 2 & 3 for any additional wake words you would like to support (e.g., "Hey Assistant").

Porcupine can detect multiple wake words with no added runtime footprint. For instance, use "Hey Chatbot" and "Dis le chat" ("Dee luh shah") simultaneously to activate the Mistral voice assistant. For tips on designing an effective wake word, review the choosing a wake word guide.

Set Up the Python Environment

Install all required Python SDKs and dependencies with a single command in the terminal:

Implement Wake Word Detection

The following code captures audio from your default microphone and detects the custom wake word locally:

Add Real-Time Speech-to-Text Transcription

After wake word detection, capture audio frames and transcribe them in real-time with Cheetah Streaming Speech-to-Text:

Once you make a natural pause in your speech, such as after asking a question, Cheetah detects it as an endpoint, signaling that you've finished speaking.

Send Transcribed Text to Mistral AI API

Send the transcribed text to Mistral AI using the chat completions endpoint:

Convert Mistral AI's Response to Speech

Transform Mistral's text response into natural speech using Orca Streaming Text-to-Speech and PvSpeaker:

In this example, Orca performs single synthesis because the Mistral API returns the full response at once. When used with streaming models, Orca Streaming Text-to-Speech can generate and play audio in real time through streaming synthesis, enabling significantly lower latency than cloud-based alternatives.

Full Python Code for Mistral Voice Assistant

This complete implementation integrates wake word detection, streaming speech-to-text, Mistral API calls, and text-to-speech synthesis:

Run the Mistral Voice Assistant

To run the voice assistant, update the model path to match your local file and have both API keys ready:

You can start building your own commercial or non-commercial projects using Picovoice's self-service Console.

Start Building

Frequently Asked Questions

Will the voice assistant work accurately in noisy environments, with different accents, or with specialized terminology?
Yes. Porcupine Wake Word and Cheetah Streaming Speech-to-Text are designed to work reliably in real-world conditions with background noise and various accents across supported languages. For domain-specific terminology or brand names, you can add boost words and custom vocabulary to Cheetah Streaming Speech-to-Text.
Can I use any wake word phrase for Mistral activation?
Yes. You can train custom wake words using Picovoice Console in seconds without collecting training data. Simply enter your desired phrase (e.g., "Hey Assistant" or your brand name) and download the trained model.
What happens if the Mistral API is unavailable?
The local voice processing components (wake word detection, speech-to-text, and text-to-speech) continue functioning independently. You can catch API exceptions and use Orca Text-to-Speech to provide voice feedback like "I'm having trouble connecting, please try again."