Build a Real-Time Meeting Summarization Tool in Python

🎯 Voice AI Consulting

Get dedicated support and consultation to ensure your specific needs are met.

Building a real-time meeting summarization tool requires more than connecting to a speech API. Most cloud-based AI meeting assistants process audio on remote servers, introducing network latency before live transcription even begins. Additionally, streaming transcripts that cut off mid-sentence or merge multiple speakers can create unreliable inputs for AI summarization.

This Python tutorial shows how to build a real-time meeting summarization tool where speech recognition runs locally on your device, eliminating network delays to provide low-latency real-time AI summaries. The speech recognition engine uses endpoint detection to identify natural pauses in conversation, then sends complete utterances to the LLM for summarization.

What You'll Build:

A real-time meeting assistant that:
- Detects a custom wake word for hands-free activation
- Provides live transcription with endpoint detection
- Generates real-time AI summaries as the conversation progresses

What You'll Need:

Python 3.9+
Microphone
Picovoice AccessKey from the Picovoice Console
OpenAI API key from the OpenAI Platform

The real-time meeting summarization tool integrates Porcupine Wake Word, Cheetah Streaming Speech-to-Text, and the OpenAI API.

Train a Custom Wake Word for Voice Activation

Sign up for a Picovoice Console account and navigate to the Porcupine page.
Enter your wake phrase such as "Hey Assistant" or "Start Meeting" and test it using the microphone button.
Click "Train", select the target platform, and download the .ppn model file.

For tips on designing an effective wake word, review the choosing a wake word guide.

Set Up the Python Environment for Automated Meeting Summarization

Install all required Python SDKs and dependencies:

Porcupine Wake Word Python SDK: pvporcupine
Cheetah Streaming Speech-to-Text Python SDK: pvcheetah
Picovoice Python Recorder library: pvrecorder
OpenAI Python library: openai

pip install pvporcupine pvcheetah pvrecorder openai

Add Wake Word Detection for Hands-Free Activation

The following code captures audio from your default microphone and detects the custom wake word locally:

import pvporcupine
from pvrecorder import PvRecorder

ACCESS_KEY = "${ACCESS_KEY}"

# Path to your wake word model file (.ppn)
# e.g., "./models/hey-assistant.ppn"
KEYWORD_PATH = "${KEYWORD_PATH}"

porcupine = pvporcupine.create(
    access_key=ACCESS_KEY,
    keyword_paths=[KEYWORD_PATH]
)

recorder = PvRecorder(frame_length=porcupine.frame_length)
recorder.start()

print("Listening for wake word...")
while True:
    pcm = recorder.read()
    keyword_index = porcupine.process(pcm)
    if keyword_index >= 0:
        print("Wake word detected. Starting meeting capture...")
        break

recorder.stop()

Porcupine Wake Word processes each audio frame on-device, triggering real-time transcription for automated meeting notes once the keyword is detected.

Add Streaming Speech-to-Text with Endpoint Detection

Once the wake word is detected, stream audio frames through speech recognition with built-in endpoint detection:

import pvcheetah

ACCESS_KEY = "${ACCESS_KEY}"

cheetah = pvcheetah.create(
    access_key=ACCESS_KEY,
    endpoint_duration_sec=1.0
)

recorder = PvRecorder(frame_length=cheetah.frame_length)
recorder.start()

print("Speak your message...")
transcript = ""
while True:
    pcm = recorder.read()
    partial_transcript, is_endpoint = cheetah.process(pcm)
    transcript += partial_transcript
    print(partial_transcript, end="", flush=True)
    if is_endpoint:
        final_transcript = cheetah.flush()
        transcript += final_transcript
        print(final_transcript)
        break

recorder.stop()
cheetah.delete()

Once you make a natural pause in your speech, Cheetah detects it as an endpoint, signaling that you've finished speaking. The flush() method processes any remaining buffered audio, producing a finalized transcript segment for real-time summarization.

Send Transcript Segments to OpenAI for Summarization

After receiving a finalized transcript segment, update the running summary using an incremental approach:

from openai import OpenAI

OPENAI_API_KEY = "${OPENAI_API_KEY}"

class MeetingSummarizer:
    def __init__(self, api_key):
        self.client = OpenAI(api_key=api_key)
        self.summary = "Meeting in progress..."
        self.history = []
    
    def update_summary(self, new_segment):
        """Update summary with new finalized utterance"""
        self.history.append(new_segment)
        
        prompt = f"""Current Summary:
{self.summary}

New Speech Segment:
{new_segment}

Update the summary to incorporate new information. Keep it concise (max 200 words) and focus on decisions and action items.

Updated Summary:"""

        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "You are a meeting assistant. Create concise summaries."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=300,
            temperature=0.3,
            timeout=45
        )
        
        self.summary = response.choices[0].message.content.strip()
        return self.summary

# transcript = finalized segment from speech recognition
summarizer = MeetingSummarizer(OPENAI_API_KEY)
updated_summary = summarizer.update_summary(transcript)
print("[SUMMARY]", updated_summary)

This code sends only finalized text to the LLM, keeping summaries stable and coherent.

Complete Python Code for Real-Time Meeting Summarization

This solution combines Porcupine Wake Word and Cheetah Streaming Speech-to-Text with OpenAI for real-time meeting summarization:

import argparse
import sys
from openai import OpenAI
import pvporcupine
import pvcheetah
from pvrecorder import PvRecorder


class MeetingSummarizer:
    def __init__(self, api_key):
        self.client = OpenAI(api_key=api_key)
        self.summary = "Meeting in progress..."
        self.history = []
    
    def update_summary(self, new_segment):
        self.history.append(new_segment)
        
        prompt = f"""Current Summary:
{self.summary}

New Speech Segment:
{new_segment}

Update the summary to incorporate new information. Keep it concise (max 200 words) and focus on decisions and action items.

Updated Summary:"""

        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "You are a meeting assistant. Create concise summaries."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=300,
            temperature=0.3,
            timeout=45
        )
        
        self.summary = response.choices[0].message.content.strip()
        return self.summary


def run_meeting_cli(access_key, keyword_paths, openai_key):
    """Main meeting capture logic for CLI"""
    porcupine = None
    cheetah = None
    recorder = None

    try:
        # Initialize engines
        porcupine = pvporcupine.create(
            access_key=access_key,
            keyword_paths=keyword_paths
        )
        
        cheetah = pvcheetah.create(
            access_key=access_key,
            endpoint_duration_sec=1.0
        )

        print(f'Porcupine version: {porcupine.version}')
        print(f'Cheetah version: {cheetah.version}\n')

        # Initialize summarizer
        summarizer = MeetingSummarizer(openai_key)

        # Initialize recorder for wake word detection
        recorder = PvRecorder(frame_length=porcupine.frame_length)
        recorder.start()

        print("Ready. Say the wake word to start meeting capture... (Ctrl+C to stop)")

        # Wait for wake word
        while True:
            pcm = recorder.read()
            keyword_index = porcupine.process(pcm)
            if keyword_index >= 0:
                print("\n[EVENT] Wake word detected. Meeting capture started.\n")
                break

        # Switch recorder for speech recognition
        recorder.stop()
        recorder.delete()
        recorder = PvRecorder(frame_length=cheetah.frame_length)
        recorder.start()

        print("Listening... (Speak naturally, pause to finalize segments)\n")

        # Stream transcription and summarization
        while True:
            transcript_buffer = []
            
            # Capture until endpoint
            while True:
                pcm = recorder.read()
                partial_transcript, is_endpoint = cheetah.process(pcm)
                
                if partial_transcript:
                    transcript_buffer.append(partial_transcript)
                    print(partial_transcript, end="", flush=True)
                
                if is_endpoint:
                    final_transcript = cheetah.flush()
                    if final_transcript:
                        transcript_buffer.append(final_transcript)
                    break
            
            # Finalize complete utterance
            complete_segment = ''.join(transcript_buffer)
            
            if complete_segment.strip():
                print(f"\n\n[SEGMENT] {complete_segment}\n")
                
                # Update summary
                updated_summary = summarizer.update_summary(complete_segment)
                
                print("=" * 60)
                print("MEETING SUMMARY:")
                print("=" * 60)
                print(updated_summary)
                print("=" * 60 + "\n")

    except KeyboardInterrupt:
        print("\n\n[EXIT] Meeting ended. Final summary:")
        print(summarizer.summary)
    except pvporcupine.PorcupineActivationLimitError:
        print("AccessKey has reached its processing limit")
    except pvcheetah.CheetahActivationLimitError:
        print("AccessKey has reached its processing limit")
    finally:
        if recorder is not None:
            recorder.delete()
        
        if cheetah is not None:
            cheetah.delete()
        
        if porcupine is not None:
            porcupine.delete()


def main() -> int:
    parser = argparse.ArgumentParser(
        description="Real-time meeting summarization with wake word and streaming STT"
    )
    parser.add_argument("--access_key", required=True, help="Picovoice AccessKey")
    parser.add_argument("--keyword_paths", nargs='+', required=True,
                       help="Path(s) to .ppn wake word model(s)")
    parser.add_argument("--openai_key", required=True, help="OpenAI API key")
    args = parser.parse_args()
    
    run_meeting_cli(args.access_key, args.keyword_paths, args.openai_key)
    return 0


if __name__ == "__main__":
    sys.exit(main())

Run the Meeting Summarization Tool

To run the meeting assistant and start generating real-time AI summaries, update the model path to match your local file and have both API keys ready:

Picovoice AccessKey (copy from Picovoice Console)
OpenAI API key

python3 meeting_summarizer.py \
  --access_key "$ACCESS_KEY" \
  --keyword_paths ./models/hey-assistant.ppn \
  --openai_key "$OPENAI_API_KEY"

You can start building your own commercial or non-commercial projects leveraging Picovoice's self-service Console.

Start Building