Building a ChatGPT voice assistant requires more than just connecting to the OpenAI API. ChatGPT voice applications need instant responses to feel natural in conversation. While the consumer version of ChatGPT supports a built-in real-time ChatGPT Voice Mode, developers using the OpenAI API must build the speech-processing pipeline themselves.
OpenAI’s Speech-to-Text API provides speech recognition for ChatGPT voice chat through OpenAI Whisper and gpt-4o-transcribe models. Both solutions process audio in the cloud, adding network latency to every voice interaction and disrupting natural conversation flow.
The newer OpenAI Realtime API supports streaming audio input and output, enabling real-time, speech-to-speech interactions for ChatGPT voice applications. However, it is provided as a single integrated pipeline. Developers cannot customize components of the speech pipeline to match their specific requirements for building their ChatGPT voice agents.
This tutorial shows how to build a ChatGPT voice assistant in Python, inspired by ChatGPT Voice Mode but implemented with on-device speech processing. This approach follows a modular architecture, allowing each component of the speech pipeline to be customized and optimized for specific use cases. It performs real-time transcription locally, activates hands-free with a custom wake word, and generates natural, low-latency voice responses for the ChatGPT voice AI pipeline, providing both speed and flexibility without relying on cloud-based processing.
What You'll Build:
- A hands-free ChatGPT voice assistant that:
- Activates with a custom wake word
- Transcribes speech to text in real time
- Sends text queries to ChatGPT via the OpenAI API
- Responds with natural, real-time voice output
What You'll Need:
- Python 3.9+
- Microphone and speakers
- Picovoice
AccessKeyfrom the Picovoice Console - OpenAI API key from the OpenAI Platform page
The solution integrates ChatGPT with speech recognition engines Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech.
Looking to integrate voice with other AI chatbots? Check out our guides to build Claude Voice Assistant and Perplexity Voice Assistant.
Train a Custom Wake Word for ChatGPT Voice Assistant
- Sign up for a Picovoice Console account and navigate to the Porcupine page.
- Enter your wake phrase such as "Hey Chat G P T" and test it using the microphone button.
- Click "Train", select the target platform, and download the
.ppnmodel file. - Repeat steps 2 & 3 for any additional wake words you would like to support (e.g., "Hey Chatbot")
For tips on designing an effective wake word, review the choosing a wake word guide.
Set Up the Python Environment
Install all required Python SDKs and dependencies with a single terminal command:
- Porcupine Wake Word Python SDK:
pvporcupine - Cheetah Streaming Speech-to-Text Python SDK:
pvcheetah - Orca Text-to-Speech Python SDK:
pvorca - Picovoice Python Recorder library:
pvrecorder - Picovoice Python Speaker library:
pvspeaker - OpenAI Python library:
openai— used for ChatGPT's OpenAI API integration.
Add Wake Word Detection to ChatGPT
The following code captures audio from your default microphone and detects the custom wake word locally:
Porcupine Wake Word processes each audio frame on-device and triggers when the keyword is recognized, providing a signal that can be used to start the transcription phase.
Add Streaming Speech-to-Text to ChatGPT Voice Assistant
Once the wake word has been detected, capture audio frames and transcribe them in real-time with Cheetah Streaming Speech-to-Text:
Once you make a natural pause in your speech, such as after asking a question, Cheetah detects it as an endpoint, signaling that you've finished speaking.
Send Voice Prompts to ChatGPT via OpenAI API
Send your prompt to ChatGPT using OpenAI's chat completions endpoint:
This minimal integration sends text to ChatGPT while all speech processing remains local, reducing latency.
Convert ChatGPT Responses to Speech Locally
Convert ChatGPT's text response into natural speech using Orca Streaming Text-to-Speech and PvSpeaker:
In this example, Orca performs single synthesis because the OpenAI API returns the full response all together. When used with a streaming model, Orca can generate and play audio in real time through streaming synthesis, enabling significantly lower latency than cloud-based alternatives.
Full Python Code for Voice-Enabled ChatGPT Assistant
This solution combines three Picovoice engines: Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech for seamless, real-time voice interactions.
Run the ChatGPT Voice Assistant
To run the voice-enabled ChatGPT assistant, update the model path to match your local file and have both API keys ready:
- Picovoice AccessKey (copy it from the Picovoice Console)
- OpenAI API key
You can start building your own commercial or non-commercial projects leveraging Picovoice's self-service Console.
Start Building






