TLDR: Learn how to build a fully hands-free Python voice note-taking app. This tutorial covers setting up voice commands to start and stop recording, processing audio with offline speech-to-text, and generating structured summaries using AI.
Voice note-taking applications help users transcribe interviews, capture lecture summaries, and log voice memos. However, manual interaction during these sessions can disrupt the user's focus. This tutorial demonstrates how to build a voice-activated note-taking app that uses distinct start and stop commands for completely hands-free operation.
The implementation uses Porcupine Wake Word for voice activation and Leopard Speech-to-Text for local transcription. Porcupine Wake Word manages the control flow by detecting two custom phrases: a wake word to begin recording (e.g., "Hey Notes") and a stop phrase to finish (e.g., "Done Notes"). This architecture ensures precise capture without manual interaction or premature cutoffs while the user is speaking. Once recording stops, the audio is transcribed locally with Leopard Speech-to-Text, and the text is sent to OpenAI for formatting. This keeps heavy speech processing on-device while leveraging the cloud only for final summarization. By running speech recognition on-device, the AI voice note-taking app eliminates network latency, resulting in more consistent performance.
What You'll Build:
- A voice note application that:
- Activates with a custom wake word and stops with a specific phrase
- Captures complete voice notes
- Transcribes recordings on-device
- Generates structured summaries from transcripts
- Operates hands-free
What You'll Need:
- Python 3.8+
- Microphone
- Picovoice
AccessKeyfrom the Picovoice Console - OpenAI API key from the OpenAI Platform
Looking for real-time AI summarization? Check out our guide for Meeting Summarization with real-time transcription.
Train a Custom Wake Word and Stop Phrase
- Sign up for a Picovoice Console account and navigate to the Porcupine page.
- Train your wake word (e.g., "Hey Notes" or "Start Recording"):
- Enter the phrase and test it using the microphone button
- Click "Train", select the target platform, and download the
.ppnmodel file asstart-recording.ppn
- Train your stop phrase (e.g., "Done Notes" or "Stop Recording"):
- Enter the phrase and test it using the microphone button
- Click "Train", select the target platform, and download the model file as
stop-recording.ppn
Select phrases that are phonetically distinct to minimize false positives. See the choosing a wake word guide for best practices.
Set Up the Python Environment
Install the required Python SDKs:
- Porcupine Wake Word Python SDK:
pvporcupine - Leopard Speech-to-Text Python SDK:
pvleopard - Picovoice Python Recorder library:
pvrecorder - OpenAI Python library:
openai
Implement Voice-Activated Controls
The following code captures audio from the default microphone and listens for specific start and stop commands:
This logic provides explicit control over the recording session, initiating and terminating only by user voice command.
Transcribe Audio
Leopard Speech-to-Text performs batch transcription to convert the audio into text:
Batch transcription processes the entire file in a single pass. This method generally yields higher accuracy than real-time streaming as the engine utilizes the full context of the sentence to resolve ambiguities.
Leopard Speech-to-Text can also transcribe directly from an audio file.
Generate Structured AI Powered Notes
Finally, the transcript is sent to GPT-4 to organize the raw text into a structured format:
By processing the full context only after the user explicitly stops recording, the LLM receives the complete input required for accurate summarization.
Full Python Code for AI Powered Voice Note-Taking App
Here is the complete source code, integrating Porcupine Wake Word for voice commands, Leopard Speech-to-Text for transcription, and OpenAI for AI powered summarization:
Run the Voice Note-Taking App
To run the AI note taking application, update the model paths to match your local files and ensure both API keys are available:
- Picovoice
AccessKey(from Picovoice Console) - OpenAI API key
You can start building your own commercial or non-commercial projects leveraging Picovoice's self-service Console.
Start Building






