Build a Low-Latency Voice Assistant with Perplexity AI in Python

🏢 Enterprise AI Consulting

Get dedicated help specific to your use case and for your hardware and software choices.

TLDR: This tutorial walks you through building a voice-enabled Perplexity AI chatbot in Python, with fully on-device speech processing. Unlike cloud-based solutions that send data to remote servers, this approach to create a Perplexity voice assistant reduces latency and ensures real-time processing, making it ideal for Perplexity voice agents and other voice AI applications that require immediate response times and smooth, uninterrupted user interactions.

Voice interfaces are no longer limited to smart speakers or mobile assistants. They’ve become a natural way for users to search, learn, and interact with information through voice-activated AI assistants.

Developers often want to add voice to AI chatbots, but cloud APIs like Google Speech-to-Text or AWS Transcribe can introduce high latency by sending voice recordings to remote servers. For Perplexity AI voice applications, where real-time performance and responsiveness matter, these compromises can become significant.

This tutorial demonstrates how to integrate voice with Perplexity using on-device speech processing with Python. The voice assistant uses Porcupine Wake Word for voice activation, Cheetah Streaming Speech-to-Text to transcribe speech, and Orca Streaming Text-to-Speech to generate voice responses. This keeps voice data fully on-device while still leveraging Perplexity’s intelligence. The architecture removes cloud round-trips for real-time, low-latency performance and scales easily across platforms and use cases.

The entire implementation fits into a single Python script that runs on Windows, macOS, Linux, and Raspberry Pi using Python 3.9+, a microphone, and speakers.

Train Custom Wake Word for Perplexity Voice Assistant

Sign up on Picovoice Console and open the Porcupine page.
Enter a wake phrase such as "Hey Perplexity", and test it with the microphone button.
Click "Train", choose the target platform, and download the .ppn model file for both wake words.
Repeat step 2 & 3 for any additional wake words you would like to support (e.g., "Hey Plex").

With Porcupine Wake Word, the voice assistant can be configured to detect multiple wake words simultaneously, allowing activation with phrases such as "Hey Perplexity" and "Hey Plex." For tips on training effective wake words, refer to the choosing a wake word guide.

Set Up Your Python Environment

Install all required Python SDKs and supporting libraries with a single command in the terminal:

Porcupine Wake Word Python SDK: pvporcupine
Cheetah Streaming Speech-to-Text Python SDK: pvcheetah
Orca Streaming Text-to-Speech Python SDK: pvorca
Picovoice Python Recorder library: pvrecorder
Picovoice Python Speaker library: pvspeaker
Requests HTTP library: requests — used for sending REST API calls to Perplexity

pip install pvporcupine pvcheetah pvorca pvrecorder pvspeaker requests

To use the Picovoice SDKs you will need a Picovoice AccessKey, which authenticates your SDK usage. You can access it in the Picovoice Console.

Embed Wake Word Detection into Perplexity Voice Assistant

The following snippet captures audio from your default microphone and detects your custom wake word locally:

import pvporcupine
from pvrecorder import PvRecorder

# Your Picovoice AccessKey from the Console; authenticates local SDK usage
ACCESS_KEY = "${ACCESS_KEY}"

# Path to your Porcupine wake-word model file (.ppn) that triggers activation
# e.g., "./models/hey-perplexity.ppn"
KEYWORD_PATH = "${KEYWORD_PATH}"

porcupine = pvporcupine.create(access_key=ACCESS_KEY, keyword_paths=[KEYWORD_PATH])
recorder = PvRecorder(frame_length=porcupine.frame_length)
recorder.start()

print("Listening for wake word...")
while True:
    pcm = recorder.read()
    keyword_index = porcupine.process(pcm)
    if keyword_index >= 0:
        print("Wake word detected.")
        break

recorder.stop()

Porcupine Wake Word processes each frame on-device and returns the index of the detected wake word.

Integrate Streaming Speech-to-Text in Perplexity Voice Assistant

Once the wake word has been detected, the transcription loop is activated. The code captures short audio frames and transcribes them using Cheetah Streaming Speech-to-Text:

import pvcheetah

ACCESS_KEY = "${ACCESS_KEY}"

cheetah = pvcheetah.create(
            access_key=ACCESS_KEY,
            endpoint_duration_sec= 1.0)

recorder = PvRecorder(frame_length=cheetah.frame_length)
recorder.start()

print("Speak your request…")
transcript = ""
while True:
    pcm = recorder.read()
    partial_transcript, is_endpoint = cheetah.process(pcm)
    transcript += partial_transcript
    print(partial_transcript, end="", flush=True)
    if is_endpoint:
        final_transcript = cheetah.flush()
        transcript += final_transcript
        print(final_transcript)
        break

recorder.stop()
cheetah.delete()

Each finalized segment returns text that is ready to send to Perplexity AI.

Connect Speech Recognition to Perplexity API

Once the text is transcribed, Perplexity API processes the text prompt:

import requests

PPLX_API_KEY = "${PPLX_API_KEY}"


def ask_perplexity(api_key: str, prompt: str, model: str = "sonar-pro") -> str:
    try:
        r = requests.post(
            "https://api.perplexity.ai/chat/completions",
            headers={
                "Authorization": f"Bearer {api_key}",
                "Content-Type": "application/json"
            },
            json={"model": model, "messages": [{"role": "user", "content": prompt}]},
            timeout=45
        )
        r.raise_for_status()
        return r.json()["choices"][0]["message"]["content"]
    except requests.exceptions.RequestException as e:
        return f"I'm having trouble connecting to Perplexity. Error: {e}"


# transcript = "example user request"
reply = ask_perplexity(PPLX_API_KEY, transcript)

Add Voice to Perplexity AI Responses

The system transforms the chatbot’s response into natural speech using Orca Streaming Text-to-Speech and PvSpeaker:

import pvorca
from pvspeaker import PvSpeaker
from collections import deque

ACCESS_KEY = "${ACCESS_KEY}"
orca = pvorca.create(access_key=ACCESS_KEY)
speaker = PvSpeaker(sample_rate=orca.sample_rate, bits_per_sample=16)

# Synthesize speech
# reply = response from perplexity
pcm_out, _ = orca.synthesize(reply)

# Play audio
speaker.start()

pcm_buffer = deque()
pcm_buffer.append(pcm_out)

while len(pcm_buffer) > 0:
    pcm = pcm_buffer.popleft()
    written = speaker.write(pcm)
    if written < len(pcm):
        pcm_buffer.appendleft(pcm[written:])

speaker.flush()
speaker.stop()

# Cleanup
speaker.delete()
orca.delete()

Orca Streaming Text-to-Speech synthesizes speech entirely on-device and streams audio as it’s generated, ensuring significantly lower latency than cloud-based alternatives.

Complete Implementation of Voice-Enabled Perplexity AI Assistant

This implementation combines three Picovoice engines: Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech. The voice processing happens entirely on-device, while only text queries are sent to the Perplexity AI API.

import argparse
from collections import deque
import sys
import requests
import pvporcupine
import pvcheetah
import pvorca
from pvrecorder import PvRecorder
from pvspeaker import PvSpeaker


def main() -> int:
    parser = argparse.ArgumentParser(
        description="Porcupine + Cheetah + Orca voice interface for Perplexity"
    )
    parser.add_argument("--access_key", required=True, help="Picovoice AccessKey")
    parser.add_argument("--keyword_paths", nargs='+', required=True, 
                       help="Path(s) to .ppn wake-word model(s)")
    parser.add_argument("--pplx_key", required=True, help="Perplexity API key")
    args = parser.parse_args()

    porcupine = None
    cheetah = None
    orca = None
    recorder = None
    speaker = None

    try:
        # Initialize engines
        porcupine = pvporcupine.create(
            access_key=args.access_key, 
            keyword_paths=args.keyword_paths)
        cheetah = pvcheetah.create(access_key=args.access_key, endpoint_duration_sec=1.0)
        orca = pvorca.create(access_key=args.access_key)

        print(f'Porcupine version: {porcupine.version}')
        print(f'Cheetah version: {cheetah.version}')
        print(f'Orca version: {orca.version}\n')

        # Initialize speaker
        speaker = PvSpeaker(
            sample_rate=orca.sample_rate, 
            bits_per_sample=16)

        # Initialize recorder
        recorder = PvRecorder(frame_length=porcupine.frame_length)
        recorder.start()

        print("Ready. Say the wake word… (Ctrl+C to stop)")

        # Wait for wake word
        while True:
            pcm = recorder.read()
            keyword_index = porcupine.process(pcm)
            if keyword_index >= 0:
                print("[EVENT] Wake word detected")
                break

        recorder.stop()
        recorder.delete()
        recorder = PvRecorder(frame_length=cheetah.frame_length)
        recorder.start()

        # Stream STT with Cheetah
        print("Speak your request…")
        transcript = ""
        while True:
            pcm = recorder.read()
            partial_transcript, is_endpoint = cheetah.process(pcm)
            transcript += partial_transcript
            print(partial_transcript, end="", flush=True)
            if is_endpoint:
                final_transcript = cheetah.flush()
                transcript += final_transcript
                print(final_transcript)
                break

        print("[TRANSCRIPT]", transcript)

        # Call Perplexity
        response = requests.post(
            "https://api.perplexity.ai/chat/completions",
            headers={
                "Authorization": f"Bearer {args.pplx_key}",
                "Content-Type": "application/json"
            },
            json={
                "model": "sonar-pro",
                "messages": [{"role": "user", "content": transcript}]
            },
            timeout=45
        )
        response.raise_for_status()
        
        reply = response.json()["choices"][0]["message"]["content"]
        print("[REPLY]", reply)

        # Synthesize speech with Orca
        pcm_out, _ = orca.synthesize(reply)

        # Play audio
        speaker.start()

        pcm_buffer = deque()
        pcm_buffer.append(pcm_out)

        while len(pcm_buffer) > 0:
            pcm = pcm_buffer.popleft()
            written = speaker.write(pcm)
            if written < len(pcm):
                pcm_buffer.appendleft(pcm[written:])

        speaker.flush()
        speaker.stop()

    except KeyboardInterrupt:
        print("\n[EXIT] Stopping…")
    except pvporcupine.PorcupineActivationLimitError:
        print("AccessKey has reached its processing limit")
    except pvcheetah.CheetahActivationLimitError:
        print("AccessKey has reached its processing limit")
    except pvorca.OrcaActivationLimitError:
        print("AccessKey has reached its processing limit")
    finally:
        # Cleanup
        if speaker is not None:
            speaker.delete()
        
        if recorder is not None:
            recorder.delete()
        
        if orca is not None:
            orca.delete()
        
        if cheetah is not None:
            cheetah.delete()
        
        if porcupine is not None:
            porcupine.delete()

    return 0


if __name__ == "__main__":
    sys.exit(main())

Run the Perplexity Voice Assistant

To run the voice-powered Perplexity AI assistant, update the keyword_paths in the command below to match your local wake word model files and ensure both API keys are correctly set:

Picovoice AccessKey – authenticates your Picovoice SDK usage (copy it from the Picovoice Console)
Perplexity API key – authorizes requests to the Perplexity API

python3 voice_chatbot.py \
  --access_key "$ACCESS_KEY" \
  --keyword_paths ./models/hey-perplexity.ppn ./models/hey-plex.ppn \
  --pplx_key "$PPLX_API_KEY" \

Looking to integrate voice with other AI platforms? Check out our guides for ChatGPT voice integration and Claude voice integration.

You can start building your own commercial or non-commercial projects leveraging Picovoice's self-service Console.

Start Building

Frequently Asked Questions

Can I customize the wake word or use a different phrase instead of 'Hey Perplexity'?

Yes. You can train any custom wake word using Picovoice Console in seconds without collecting training data. Simply type your desired phrase (e.g., "Hey Assistant", "Computer", or your brand name), and download the trained model. The wake word guide provides best practices for choosing effective wake phrases.

Can I build a voice assistant for languages other than English?

Yes. The tutorial uses Porcupine Wake Word, supporting 9 languages; Cheetah Streaming Speech-to-Text, supporting 6 languages; and Orca Streaming Text-to-Speech, supporting 8 languages, all supporting English, French, German, Italian, Portuguese, and Spanish. To build a voice assistant in a different language, download language model files from the Picovoice Console and specify the model path when initializing each engine.

Will the voice assistant work accurately in noisy environments, with different accents, or with specialized terminology?

Yes. Porcupine Wake Word and Cheetah Streaming Speech-to-Text are designed to work reliably in real-world conditions with background noise and various accents across supported languages. For increasing accuracy on domain-specific terminology or brand names, you can also add boost words and custom vocabulary to Cheetah Streaming Speech-to-Text.