🚀 Best-in-class Voice AI!
Build compliant and low-latency AI apps using Python without sending user data to 3rd party servers.
Start Free

Capturing and playing real-time PCM audio in Python is a core requirement for text-to-speech, voice assistants, and other voice AI applications. While general-purpose audio libraries such as PyAudio or sounddevice can be used for audio playback, they are not specifically designed for real-time speech and voice-driven workloads. PvSpeaker is a cross-platform Python audio playback library that streams raw PCM audio directly to the system audio output on Windows, macOS, Linux, and Raspberry Pi, with a focus on real-time speech playback. This tutorial shows how to set up PvSpeaker and stream PCM audio in Python for voice AI applications.

Quick Answer: How to Play Audio in Python

PvSpeaker is designed specifically to stream audio from speech or voice applications. Install the pvspeaker package on PyPI using pip, initialize PvSpeaker with your audio format (sample rate and bit depth), call start() to start the audio output device, write() to stream PCM data to the speaker, flush() to ensure all audio finishes playing, then stop() and delete() to clean up resources.

What you'll learn in this guide:

  • Install & set up PvSpeaker for audio playback (runs on: Windows, macOS, Linux, Raspberry Pi)
  • Select specific audio output devices for multi-speaker systems
  • Stream PCM audio data to speakers in real-time
  • Play audio files and handle audio buffers

To learn how to record audio in Python for speech recognition, refer to: How to Record Audio using Python.

Python Audio Playback: Requirements and Platform Support

System Requirements for Python Audio Output

  • Python 3.9+
  • PIP package manager

Python Audio Playback Platform Compatibility

  • Linux (x86_64)
  • macOS (x86_64, arm64)
  • Windows (x86_64, arm64)
  • Raspberry Pi (3, 4, 5)

Play Audio in Python: Step-by-Step Guide

Step 1: Install PvSpeaker Python Package for Audio Playback

Install the PvSpeaker Python package using pip:

This lightweight SDK provides cross-platform audio playback without requiring platform-specific dependencies or complex audio library configuration.

Step 2: Initialize PvSpeaker with Audio Format Specifications

Import PvSpeaker and create an instance configured with your audio format:

The sample rate and bit depth must match your audio source format.

Step 3: Start the Audio Output Device

Call start() to initialize the audio output device and prepare it for streaming PCM data:

Starting the speaker allocates audio hardware resources and creates an internal buffer for smooth playback. Once started, you can begin writing PCM audio frames.

Step 4: Stream PCM Audio Data to the Speaker

Use the write() method to send PCM audio frames to the speaker for playback:

The write() method fills PvSpeaker's internal circular buffer with as much data as it can hold and returns the number of samples successfully written. To ensure all audio is played, we need to track the number of samples written and retry sending any remaining data until the full frame is streamed.

Configure the Capacity of Internally Buffered Audio

By default, PvSpeaker buffers up to 20 seconds of audio. To adjust the size of the internal circular buffer, use buffer_size_secs when initializing:

Step 5: Flush Buffered Audio to Complete Playback

When all audio data has been written, call flush() to wait for all buffered PCM data to finish playing:

The flush() method blocks until all previously buffered audio has been played through the speakers. This ensures audio isn't cut off prematurely when your program exits.

Step 6: Stop Playback and Release Audio Resources

When finished with audio playback, stop the speaker and free allocated resources:

Call stop() to halt audio output if playback is still in progress. Note that to stop audio before it finishes playing, stop() must be called from a separate thread from flush().

Always call delete() to release audio hardware resources when you're completely done with the speaker. If audio has already finished playing after flush(), you can call delete() directly without stop().

Selecting a Specific Audio Output Device

For systems with multiple audio output devices (like external speakers, headphones, or HDMI audio), you can select a specific device by index.

First, get a list of available audio output devices:

Then specify the desired device when initializing PvSpeaker:

If you don't specify a device_index, PvSpeaker uses the system's default audio output device.

Complete Python Audio Playback Example

This example shows how to play a WAV file using PvSpeaker by converting it into raw PCM samples and streaming them incrementally to the audio output device.

The wave module is used to open and validate the WAV file, extract its format (sample rate, bit depth, and channel count), and convert the audio to PCM data.

To mimic real-time audio generation, the PCM data is written to the speaker in one-second chunks rather than all at once. Each chunk is fully written using a loop that accounts for partial writes, which can occur depending on the underlying audio buffer state.

Before running the example code, replace ${AUDIO_FILE_PATH} with the path to the WAV file you want to play.

PvSpeaker supports single-channel audio only, with 8, 16, 24, or 32 bits per sample.

Overall, this example covers the full audio playback workflow: loading and validating an audio file, converting it to PCM samples, streaming audio in controlled chunks, waiting for playback to finish, and shutting down the audio device.

For a more complete demo application, refer to the PvSpeaker Python demo on GitHub. For more complete implementation details, refer to the PvSpeaker Python API documentation.

Bonus: to see how PvSpeaker can be integrated in a real-time speech audio generation & playback pipeline, check out the Orca Streaming Text-to-Speech Python Demo on GitHub.


Format Specifications for Audio Playback

PvSpeaker supports the following PCM audio formats:

  • Sample Rates: Configurable (common: 16 kHz, 22.05 kHz, 44.1 kHz, 48 kHz)
  • Bit Depth: 8-bit unsigned integer, or 16/24/32-bit signed integer
  • Channels: Mono (single channel)
  • Format: Raw PCM audio data

Ensure your audio source format matches the sample rate, bit depth, and number of channels specified when initializing PvSpeaker to avoid playback issues.


Next Steps: Build Voice AI Applications with Audio Playback

Now that you can play audio in Python, you can build complete voice AI applications:

PvSpeaker integrates seamlessly with Picovoice's voice AI SDKs, enabling fully offline speech synthesis and playback without cloud dependencies.

Start Building

Frequently Asked Questions

How do I play audio in Python?
Use PvSpeaker to play audio in Python. Install the PvSpeaker Python package with 'pip3 install pvspeaker', initialize a PvSpeaker instance with your audio format (sample rate and bit depth), stream PCM audio data, wait for all audio to finish playing, then stop and delete the PvSpeaker instance to clean up. The PvSpeaker Python SDK runs on Windows, macOS, Linux, and Raspberry Pi.
How do I play text-to-speech audio in Python?
Generate PCM audio from a text-to-speech engine (such as Orca Streaming Text-to-Speech), then stream it to PvSpeaker using 'write()'. Initialize PvSpeaker with the same sample rate your TTS engine outputs, and write audio chunks as they're generated. This works with any TTS engine that produces PCM audio.
Can I select which speaker or audio device to use in Python?
Yes. Use 'PvSpeaker.get_available_devices()' to list all audio output devices, then pass the desired device index to PvSpeaker's 'device_index' parameter during initialization. This is useful for systems with multiple audio outputs like external speakers, headphones, or HDMI audio.
Is PvSpeaker suitable for real-time audio playback in Python?
Yes. PvSpeaker is designed for real-time audio streaming. It uses an internal circular buffer that allows continuous writing while audio plays, making it ideal for real-time applications, such as streaming text-to-speech, live audio processing, and interactive voice applications. The 'write()' method is non-blocking for smooth real-time performance.