🎯 Voice AI Consulting
Get dedicated support and consultation to ensure your specific needs are met.
Consult an AI Expert

TLDR: Build voice AI agents for patient triage, appointment scheduling, and medical billing using Python with on-device speech processing. This tutorial implements a HIPAA-compliant medical voice assistant with wake word detection, real-time speech-to-text optimized for clinical vocabulary, and voice synthesis.

Cloud-based voice AI creates deployment challenges for healthcare: 1-2 seconds of latency disrupts conversation flow, and transmitting patient voice data requires extensive HIPAA compliance infrastructure. The 2024 Perry Johnson & Associates breach exposed 4 million patient records despite compliance measures, demonstrating the risks of centralized cloud storage.

On-device speech processing provides a secure alternative. By handling audio entirely on the edge and transmitting only anonymized text for reasoning, this approach minimizes network latency while ensuring Protected Health Information (PHI) remains secured within the local infrastructure, mitigating cloud compliance risks.

This tutorial demonstrates this hybrid edge architecture by building a fully functional triage agent. Wake word detection, speech recognition, and voice synthesis run entirely on-device, while GPT-4 is used strictly for medical reasoning on sanitized text.

What You'll Build:

  • Wake word activation ("Hey Doctor") for hands-free operation
  • Real-time speech transcription optimized for medical vocabulary
  • GPT-4 reasoning layer
  • Natural voice synthesis for patient responses
  • Complete Python implementation deployable on edge devices

What You'll Need:

System Architecture

The medical triage agent operates on a strict privacy-first pipeline, ensuring patient audio never leaves the device.

  1. Wake Word Detection: The system remains in a passive listening state using Porcupine Wake Word, waiting for a specific medical phrase (e.g., "Hey Doctor") to trigger activation without sending audio to the cloud.
  2. Local Transcription: Once triggered, Cheetah Streaming Speech-to-Text transcribes the patient's speech in real-time. This engine is optimized with a custom vocabulary to accurately capture clinical terminology, achieving higher accuracy than generic models.
  3. Sanitization & Reasoning: A local function strips Personally Identifiable Information or PII (names, dates) from the transcript before sending only the anonymized text to the OpenAI API for medical assessment.
  4. Voice Response: The text-based triage advice is converted back into natural speech using Orca Streaming Text-to-Speech and played to the patient.
Block Diagram Voice Loop

For deployments requiring zero cloud transmission of patient data, replace OpenAI with picoLLM to run the reasoning layer entirely on-device.

Create Custom Wake Word for Medical Assistant

  1. Sign up for a Picovoice Console account and navigate to the Porcupine page.
  2. Enter your wake phrase such as "Hey Doctor" and test it using the microphone button.
  3. Click "Train", select the target platform, and download the .ppn model file.

For tips on designing an effective wake word, review the choosing a wake word guide.

Optimizing Speech Recognition for Clinical Vocabulary

  1. Sign up for a Picovoice Console account and navigate to the Leopard & Cheetah page.
  2. Click "New Model", give the model a name, choose the target language, and click "Create Model".
  3. Import the medical-dictionary.yml to add custom vocabulary to the model.

medical-dictionary.yml is a curated medical vocabulary for the real-time transcription model, built with the help of the Common Medical Words dataset. Learn how to generate your own in the Custom Speech-to-Text Model guide.

  1. Test the model using the microphone button.
  2. Download the model.

To further improve accuracy for speech-to-text, you can add boost words to your .yml file. Boost words increase the likelihood of correctly detecting important medical phrases, improving transcription accuracy for frequently used clinical terminology.

Install Python Dependencies

Install all required Python SDKs and dependencies with a single terminal command:

Implement Wake Word Detection

Implement wake word detection to activate the agent hands-free:

Add Real-Time Medical Speech Recognition

Once the wake word has been detected, capture audio frames and transcribe them in real-time with Cheetah Streaming Speech-to-Text:

Implement AI-Powered Symptom Assessment

The triage engine takes the transcribed text output from Cheetah, sanitizes it to remove PII, and sends the anonymized query to the LLM for reasoning. The following code strips any personally identifiable information to protect patient privacy.

The modular architecture allows swapping GPT-4 for different reasoning models based on your deployment requirements.

Add Text-to-Speech Voice Responses

Convert triage assessments into natural speech:

Orca Streaming Text-to-Speech provides high-quality voice synthesis that runs entirely on-device, keeping patient interactions private.

Complete Medical Triage Voice Agent Code

Here's the full implementation combining all components:

Run the Medical Triage Agent

You will need the Picovoice AccessKey to use the SDKs. Copy it from the Picovoice Console.

Run the following command in your terminal. Replace the placeholder values with your own ACCESS_KEY, OPENAI_KEY and the file paths to your models.

The medical triage agent is now ready and listening for the wake word.

Example: Emergency Symptom Detection

Medical Voice AI for Appointment Scheduling, Billing, and Prescription Refills

The voice AI architecture demonstrated in this tutorial can be adapted for various medical applications beyond triage. Here are common implementations using the same core pipeline:

Appointment Scheduling Agent

Purpose: Automates patient appointment booking, rescheduling, and cancellations through natural conversation.

Implementation approach:

  • Integrate with calendar/scheduling APIs (custom EHR systems)
  • Add slot availability checking and confirmation workflows

Example interaction:

Billing Support Agent

Purpose: Handles insurance inquiries, payment processing, and billing questions.

Implementation approach:

  • Integrate with billing systems and payment processors
  • Implement secure payment collection workflows

Example interaction:

Prescription Refill Agent

Purpose: Automates prescription refill requests and pharmacy coordination.

Implementation approach:

  • Integrate with pharmacy management systems
  • Verify patient medication lists and refill eligibility through your EHR

Example interaction:

Medical Records Request Agent

Purpose: Processes requests for medical records, test results, and documentation.

Implementation approach:

  • Integrate with EHR systems for record retrieval
  • Provide secure delivery options (patient portal, fax, mail)

Example interaction:

Each of these implementations uses the same core voice pipeline with domain-specific system prompts, custom vocabulary for their domain, and integrations with relevant healthcare systems.

Next Steps: Customization and Integration

Multi-Language Support: Picovoice supports multiple languages for all models. Get language-specific Porcupine, Cheetah and Orca for different patient populations.

Interactive Follow-Up: Add follow-up questions based on initial symptoms. For instance, if a patient mentions pain, ask about severity on a 1-10 scale. If they mention fever, ask about temperature readings.

EHR Integration: Connect with electronic health records to access patient history and save triage assessments. In production deployments, the system would query your EHR's API for relevant medical history and write back triage results as encounter notes.

Phone System Integration: Integrate with existing phone infrastructure using Twilio or similar services. Incoming calls trigger the triage agent, and responses are delivered through the phone system's audio interface.

Multi-Agent Systems: Combine multiple agent types into a unified voice AI system. A patient could start with triage, get routed to appointment scheduling, and finish with billing questions—all in one continuous conversation.

Frequently Asked Questions

What happens if the OpenAI API calls fail or timeout during a patient conversation?
Network timeouts, rate limits, or API outages can cause the GPT-4 request to fail. You can wrap OpenAI calls in try-except blocks and use Orca Streaming Text-to-Speech to provide voice feedback like "I'm experiencing technical difficulties, please hold" or "I apologize for the delay." Since Porcupine Wake Word and Cheetah Streaming Speech-to-Text process audio entirely on-device, the voice interface remains functional during API failures, only the triage reasoning becomes unavailable.
How does the system maintain patient privacy and HIPAA compliance?
All audio containing Protected Health Information (PHI) is processed locally where Cheetah Streaming Speech-to-Text runs. The speech engine transcribes voice data without sending raw audio to external servers. Only the transcribed text (with personally identifiable information stripped) reaches GPT-4 for triage decisions. Audio data never leaves the local application environment, addressing a core HIPAA requirement around PHI transmission.
Can I customize the wake word and triage prompts for my specific practice?
You can train any custom wake word using Picovoice Console. Use your practice name or branding (e.g., "Hey City Medical" or "Health Partners Assistant"). For triage prompts, modify the system prompt to include your practice's specific protocols, provider names, office hours, and operational procedures. You can adjust classification thresholds and response templates to match your workflow.