🏢 Enterprise AI Consulting
Get dedicated help specific to your use case and for your hardware and software choices.
Consult an AI Expert

TLDR: This tutorial walks you through building a voice-enabled Perplexity AI chatbot in Python, with fully on-device speech processing. Unlike cloud-based solutions that send data to remote servers, this approach to create a Perplexity voice assistant reduces latency and ensures real-time processing, making it ideal for Perplexity voice agents and other voice AI applications that require immediate response times and smooth, uninterrupted user interactions.

Voice interfaces are no longer limited to smart speakers or mobile assistants. They’ve become a natural way for users to search, learn, and interact with information through voice-activated AI assistants.

Developers often want to add voice to AI chatbots, but cloud APIs like Google Speech-to-Text or AWS Transcribe can introduce high latency by sending voice recordings to remote servers. For Perplexity AI voice applications, where real-time performance and responsiveness matter, these compromises can become significant.

This tutorial demonstrates how to integrate voice with Perplexity using on-device speech processing with Python. The voice assistant uses Porcupine Wake Word for voice activation, Cheetah Streaming Speech-to-Text to transcribe speech, and Orca Streaming Text-to-Speech to generate voice responses. This keeps voice data fully on-device while still leveraging Perplexity’s intelligence. The architecture removes cloud round-trips for real-time, low-latency performance and scales easily across platforms and use cases.

The entire implementation fits into a single Python script that runs on Windows, macOS, Linux, and Raspberry Pi using Python 3.9+, a microphone, and speakers.

Train Custom Wake Word for Perplexity Voice Assistant

  1. Sign up on Picovoice Console and open the Porcupine page.
  2. Enter a wake phrase such as "Hey Perplexity", and test it with the microphone button.
  3. Click "Train", choose the target platform, and download the .ppn model file for both wake words.
  4. Repeat step 2 & 3 for any additional wake words you would like to support (e.g., "Hey Plex").

With Porcupine Wake Word, the voice assistant can be configured to detect multiple wake words simultaneously, allowing activation with phrases such as "Hey Perplexity" and "Hey Plex." For tips on training effective wake words, refer to the choosing a wake word guide.

Set Up Your Python Environment

Install all required Python SDKs and supporting libraries with a single command in the terminal:

To use the Picovoice SDKs you will need a Picovoice AccessKey, which authenticates your SDK usage. You can access it in the Picovoice Console.

Embed Wake Word Detection into Perplexity Voice Assistant

The following snippet captures audio from your default microphone and detects your custom wake word locally:

Porcupine Wake Word processes each frame on-device and returns the index of the detected wake word.

Integrate Streaming Speech-to-Text in Perplexity Voice Assistant

Once the wake word has been detected, the transcription loop is activated. The code captures short audio frames and transcribes them using Cheetah Streaming Speech-to-Text:

Each finalized segment returns text that is ready to send to Perplexity AI.

Connect Speech Recognition to Perplexity API

Once the text is transcribed, Perplexity API processes the text prompt:

Add Voice to Perplexity AI Responses

The system transforms the chatbot’s response into natural speech using Orca Streaming Text-to-Speech and PvSpeaker:

Orca Streaming Text-to-Speech synthesizes speech entirely on-device and streams audio as it’s generated, ensuring significantly lower latency than cloud-based alternatives.

Complete Implementation of Voice-Enabled Perplexity AI Assistant

This implementation combines three Picovoice engines: Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech. The voice processing happens entirely on-device, while only text queries are sent to the Perplexity AI API.

Run the Perplexity Voice Assistant

To run the voice-powered Perplexity AI assistant, update the keyword_paths in the command below to match your local wake word model files and ensure both API keys are correctly set:

  • Picovoice AccessKey – authenticates your Picovoice SDK usage (copy it from the Picovoice Console)
  • Perplexity API key – authorizes requests to the Perplexity API

Looking to integrate voice with other AI platforms? Check out our guides for ChatGPT voice integration and Claude voice integration.

You can start building your own commercial or non-commercial projects leveraging Picovoice's self-service Console.

Start Building

Frequently Asked Questions

Can I customize the wake word or use a different phrase instead of 'Hey Perplexity'?
Yes. You can train any custom wake word using Picovoice Console in seconds without collecting training data. Simply type your desired phrase (e.g., "Hey Assistant", "Computer", or your brand name), and download the trained model. The wake word guide provides best practices for choosing effective wake phrases.
Can I build a voice assistant for languages other than English?
Yes. The tutorial uses Porcupine Wake Word, supporting 9 languages; Cheetah Streaming Speech-to-Text, supporting 6 languages; and Orca Streaming Text-to-Speech, supporting 8 languages, all supporting English, French, German, Italian, Portuguese, and Spanish. To build a voice assistant in a different language, download language model files from the Picovoice Console and specify the model path when initializing each engine.
Will the voice assistant work accurately in noisy environments, with different accents, or with specialized terminology?
Yes. Porcupine Wake Word and Cheetah Streaming Speech-to-Text are designed to work reliably in real-world conditions with background noise and various accents across supported languages. For increasing accuracy on domain-specific terminology or brand names, you can also add boost words and custom vocabulary to Cheetah Streaming Speech-to-Text.