🎯 Voice AI Consulting
Get dedicated support and consultation to ensure your specific needs are met.
Consult an AI Expert

Building a real-time meeting summarization tool requires more than connecting to a speech API. Most cloud-based AI meeting assistants process audio on remote servers, introducing network latency before live transcription even begins. Additionally, streaming transcripts that cut off mid-sentence or merge multiple speakers can create unreliable inputs for AI summarization.

This Python tutorial shows how to build a real-time meeting summarization tool where speech recognition runs locally on your device, eliminating network delays to provide low-latency real-time AI summaries. The speech recognition engine uses endpoint detection to identify natural pauses in conversation, then sends complete utterances to the LLM for summarization.

What You'll Build:

  • A real-time meeting assistant that:
    • Detects a custom wake word for hands-free activation
    • Provides live transcription with endpoint detection
    • Generates real-time AI summaries as the conversation progresses

What You'll Need:

The real-time meeting summarization tool integrates Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and the OpenAI API.

Train a Custom Wake Word for Voice Activation

  1. Sign up for a Picovoice Console account and navigate to the Porcupine page.
  2. Enter your wake phrase such as "Hey Assistant" or "Start Meeting" and test it using the microphone button.
  3. Click "Train", select the target platform, and download the .ppn model file.

For tips on designing an effective wake word, review the choosing a wake word guide.

Set Up the Python Environment for Automated Meeting Summarization

Install all required Python SDKs and dependencies:

Add Wake Word Detection for Hands-Free Activation

The following code captures audio from your default microphone and detects the custom wake word locally:

Porcupine Wake Word processes each audio frame on-device, triggering real-time transcription for automated meeting notes once the keyword is detected.

Add Streaming Speech-to-Text with Endpoint Detection

Once the wake word is detected, stream audio frames through speech recognition with built-in endpoint detection:

Once you make a natural pause in your speech, Cheetah detects it as an endpoint, signaling that you've finished speaking. The flush() method processes any remaining buffered audio, producing a finalized transcript segment for real-time summarization.

Send Transcript Segments to OpenAI for Summarization

After receiving a finalized transcript segment, update the running summary using an incremental approach:

This code sends only finalized text to the LLM, keeping summaries stable and coherent.

Complete Python Code for Real-Time Meeting Summarization

This solution combines Porcupine Wake Word and Cheetah Streaming Speech-to-Text with OpenAI for real-time meeting summarization:

Run the Meeting Summarization Tool

To run the meeting assistant and start generating real-time AI summaries, update the model path to match your local file and have both API keys ready:

You can start building your own commercial or non-commercial projects leveraging Picovoice's self-service Console.

Start Building