🏢 Enterprise AI Consulting
Get dedicated help specific to your use case and for your hardware and software choices.
Consult an AI Expert

Building a ChatGPT voice assistant requires more than just connecting to the OpenAI API. ChatGPT voice applications need instant responses to feel natural in conversation. While the consumer version of ChatGPT supports a built-in real-time ChatGPT Voice Mode, developers using the OpenAI API must build the speech-processing pipeline themselves.

OpenAI’s Speech-to-Text API provides speech recognition for ChatGPT voice chat through OpenAI Whisper and gpt-4o-transcribe models. Both solutions process audio in the cloud, adding network latency to every voice interaction and disrupting natural conversation flow.

The newer OpenAI Realtime API supports streaming audio input and output, enabling real-time, speech-to-speech interactions for ChatGPT voice applications. However, it is provided as a single integrated pipeline. Developers cannot customize components of the speech pipeline to match their specific requirements for building their ChatGPT voice agents.

This tutorial shows how to build a ChatGPT voice assistant in Python, inspired by ChatGPT Voice Mode but implemented with on-device speech processing. This approach follows a modular architecture, allowing each component of the speech pipeline to be customized and optimized for specific use cases. It performs real-time transcription locally, activates hands-free with a custom wake word, and generates natural, low-latency voice responses for the ChatGPT voice AI pipeline, providing both speed and flexibility without relying on cloud-based processing.

What You'll Build:

  • A hands-free ChatGPT voice assistant that:
    • Activates with a custom wake word
    • Transcribes speech to text in real time
    • Sends text queries to ChatGPT via the OpenAI API
    • Responds with natural, real-time voice output

What You'll Need:

The solution integrates ChatGPT with speech recognition engines Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech.

Looking to integrate voice with other AI chatbots? Check out our guides to build Claude Voice Assistant and Perplexity Voice Assistant.

Train a Custom Wake Word for ChatGPT Voice Assistant

  1. Sign up for a Picovoice Console account and navigate to the Porcupine page.
  2. Enter your wake phrase such as "Hey Chat G P T" and test it using the microphone button.
  3. Click "Train", select the target platform, and download the .ppn model file.
  4. Repeat steps 2 & 3 for any additional wake words you would like to support (e.g., "Hey Chatbot")

For tips on designing an effective wake word, review the choosing a wake word guide.

Set Up the Python Environment

Install all required Python SDKs and dependencies with a single terminal command:

Add Wake Word Detection to ChatGPT

The following code captures audio from your default microphone and detects the custom wake word locally:

Porcupine Wake Word processes each audio frame on-device and triggers when the keyword is recognized, providing a signal that can be used to start the transcription phase.

Add Streaming Speech-to-Text to ChatGPT Voice Assistant

Once the wake word has been detected, capture audio frames and transcribe them in real-time with Cheetah Streaming Speech-to-Text:

Once you make a natural pause in your speech, such as after asking a question, Cheetah detects it as an endpoint, signaling that you've finished speaking.

Send Voice Prompts to ChatGPT via OpenAI API

Send your prompt to ChatGPT using OpenAI's chat completions endpoint:

This minimal integration sends text to ChatGPT while all speech processing remains local, reducing latency.

Convert ChatGPT Responses to Speech Locally

Convert ChatGPT's text response into natural speech using Orca Streaming Text-to-Speech and PvSpeaker:

In this example, Orca performs single synthesis because the OpenAI API returns the full response all together. When used with a streaming model, Orca can generate and play audio in real time through streaming synthesis, enabling significantly lower latency than cloud-based alternatives.

Full Python Code for Voice-Enabled ChatGPT Assistant

This solution combines three Picovoice engines: Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech for seamless, real-time voice interactions.

Run the ChatGPT Voice Assistant

To run the voice-enabled ChatGPT assistant, update the model path to match your local file and have both API keys ready:

You can start building your own commercial or non-commercial projects leveraging Picovoice's self-service Console.

Start Building

Frequently Asked Questions

Will the voice assistant work accurately in noisy environments, with different accents, or with specialized terminology?
Yes. Porcupine Wake Word and Cheetah Streaming Speech-to-Text are designed to work reliably in real-world conditions with background noise and various accents across supported languages. For increasing accuracy on domain-specific terminology or brand names, you can also add boost words and custom vocabulary to Cheetah Streaming Speech-to-Text.
Can I use a different wake word instead of 'Hey chatbot' for my voice assistant?
Yes. You can train any custom wake word using Picovoice Console in seconds without collecting training data. Simply enter your desired phrase (e.g., "Hey Computer", or your brand name), and download the trained model. The wake word guide provides best practices for selecting effective wake phrases. You can also detect multiple wake words simultaneously to support different commands.
What will happen to the voice assistant if ChatGPT API calls fail or timeout during a conversation?
Network timeouts, rate limits, or API outages can cause the ChatGPT request to fail. You can catch these exceptions and use Orca Streaming Text-to-Speech to provide voice feedback like "I'm having trouble connecting, please try again." Since Porcupine Wake Word and Cheetah Streaming Speech-to-Text run entirely on-device, the voice interface remains functional during API failures and only ChatGPT responses are unavailable.