Building a real-time meeting summarization tool requires more than connecting to a speech API. Most cloud-based AI meeting assistants process audio on remote servers, introducing network latency before live transcription even begins. Additionally, streaming transcripts that cut off mid-sentence or merge multiple speakers can create unreliable inputs for AI summarization.
This Python tutorial shows how to build a real-time meeting summarization tool where speech recognition runs locally on your device, eliminating network delays to provide low-latency real-time AI summaries. The speech recognition engine uses endpoint detection to identify natural pauses in conversation, then sends complete utterances to the LLM for summarization.
What You'll Build:
- A real-time meeting assistant that:
- Detects a custom wake word for hands-free activation
- Provides live transcription with endpoint detection
- Generates real-time AI summaries as the conversation progresses
What You'll Need:
- Python 3.9+
- Microphone
- Picovoice
AccessKeyfrom the Picovoice Console - OpenAI API key from the OpenAI Platform
The real-time meeting summarization tool integrates Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and the OpenAI API.
Train a Custom Wake Word for Voice Activation
- Sign up for a Picovoice Console account and navigate to the Porcupine page.
- Enter your wake phrase such as "Hey Assistant" or "Start Meeting" and test it using the microphone button.
- Click "Train", select the target platform, and download the
.ppnmodel file.
For tips on designing an effective wake word, review the choosing a wake word guide.
Set Up the Python Environment for Automated Meeting Summarization
Install all required Python SDKs and dependencies:
- Porcupine Wake Word Python SDK:
pvporcupine - Cheetah Streaming Speech-to-Text Python SDK:
pvcheetah - Picovoice Python Recorder library:
pvrecorder - OpenAI Python library:
openai
Add Wake Word Detection for Hands-Free Activation
The following code captures audio from your default microphone and detects the custom wake word locally:
Porcupine Wake Word processes each audio frame on-device, triggering real-time transcription for automated meeting notes once the keyword is detected.
Add Streaming Speech-to-Text with Endpoint Detection
Once the wake word is detected, stream audio frames through speech recognition with built-in endpoint detection:
Once you make a natural pause in your speech, Cheetah detects it as an endpoint, signaling that you've finished speaking. The flush() method processes any remaining buffered audio, producing a finalized transcript segment for real-time summarization.
Send Transcript Segments to OpenAI for Summarization
After receiving a finalized transcript segment, update the running summary using an incremental approach:
This code sends only finalized text to the LLM, keeping summaries stable and coherent.
Complete Python Code for Real-Time Meeting Summarization
This solution combines Porcupine Wake Word and Cheetah Streaming Speech-to-Text with OpenAI for real-time meeting summarization:
Run the Meeting Summarization Tool
To run the meeting assistant and start generating real-time AI summaries, update the model path to match your local file and have both API keys ready:
- Picovoice
AccessKey(copy from Picovoice Console) - OpenAI API key
You can start building your own commercial or non-commercial projects leveraging Picovoice's self-service Console.
Start Building






