Build Voice Banking Assistant with Python SDK

🏢 Enterprise AI Consulting

Get dedicated help specific to your use case and for your hardware and software choices.

TLDR: Add privacy-first voice to your banking assistant in Python. This tutorial uses on-device voice AI (no cloud services, no data sharing) and covers:

Custom wake words for voice-activated banking
Privacy-first speech recognition (meets privacy regulations, such as GDPR)
Low latency text-to-speech responses (no network roundtrips)

By the end of this tutorial, you'll have a privacy-by-design Python script that wakes on custom phrases like "Hey Capital Assistant", recognizes spoken commands (check balance, transfer funds, pay bills), and responds with natural-sounding speech.

Why On-device AI Matters for Banking Apps

Voice banking creates an immediate privacy challenge: every balance check, fund transfer, and bill payment spoken aloud contains regulated financial data. Cloud-based voice APIs (Google Speech, AWS Transcribe, Azure Speech) send customer audio to external servers, creating three critical risks:

GDPR violations
Expanded compliance scope (3rd parties require contracts, audits, and oversight)
Data breach exposure (financial info exposed even during transferring data)

On-device processing eliminates these risks — audio never leaves the customer's device, satisfying GDPR Article 25 while reducing regulatory burden.

Who This Tutorial Is For

Fintech developers building voice-enabled banking apps
Compliance teams evaluating voice AI solutions
Mobile banking teams seeking GDPR-compliant speech recognition
Product managers exploring voice banking features

What You'll Build

This tutorial builds a voice assistant banking system that processes audio entirely on-device. Porcupine Wake Word enables voice activated banking, Rhino Speech-to-Intent recognizes voice banking commands, and Orca Streaming Text-to-Speech generates natural sounding responses — all within a secure, cloud-free voice AI pipeline suitable for fintech and mobile banking apps.

Prerequisites for Adding Privacy-first Voice to Banking Apps

Python 3.9 or higher installed on the development machine
Picovoice AccessKey: from the Picovoice Console for free

Train a Custom Wake Word for Voice Banking AI Agents

Sign up for a Picovoice Console account and navigate to the Porcupine page.
Enter your wake phrase such as "Hey Capital Assistant" and test it using the microphone button.
Click "Train", select the target platform, and download the .ppn model file.

For tips on designing an effective wake word, refer to Choosing a wake word

Define Voice Commands for Voice Banking

Voice banking requires understanding specific financial intents. This YAML context defines the commands the voice assistant banking system will recognize.

In the Rhino section of Picovoice Console, create an empty context for your voice assistant.
Click the "Import YAML" button in the top-right corner of the Console. Paste the YAML provided below to add the banking voice commands.

YAML Context for Voice Banking Assistant:

context:
  expressions:
    checkBalance:
        - "what is my ($accountType:account) balance"
        - "how much is in my ($accountType:account)"
        - "check ($accountType:account) balance"
        - "balance for ($accountType:account)"
        - "($accountType:account) account balance"
        - "(@polite) check my balance"

    transferFunds:
        - "(@polite) transfer $pv.TwoDigitInteger:amount (dollars) from $accountType:fromAccount to $destination:toAccount"
        - "(@polite) move $pv.TwoDigitInteger:amount (dollars) to $destination:toAccount"
        - "(@polite) send $pv.TwoDigitInteger:amount (dollars) from $accountType:fromAccount to $destination:toAccount"
        - "(@polite) transfer $pv.TwoDigitInteger:amount (dollars)"

    payBill:
        - "(@polite) pay my $payee:payee bill"
        - "(@polite) pay $pv.TwoDigitInteger:amount (dollars) to $payee:payee"
        - "(@polite) make a payment to $payee:payee"
        - "(@polite) pay $payee:payee"

    recentTransactions:
        - "show (me) (my) recent transactions"
        - "what did I spend on $category:category"
        - "(show) recent ($accountType:account) transactions"
        - "transactions from last week"
        - "(@polite) show my ($accountType:account) activity"

    accountInfo:
        - "what's my routing number"
        - "account number for ($accountType:account)"
        - "give me (my) account details"
        - "(@polite) get my ($accountType:account) details"

  slots:
    accountType:
        - "checking"
        - "savings"
        - "credit card"
        - "investment"

    payee:
        - "utilities"
        - "rent"
        - "credit card"
        - "mortgage"

    category:
        - "groceries"
        - "restaurants"
        - "transportation"
        - "entertainment"

    destination:
        - "checking"
        - "savings"
        - "investment"

  macros:
    polite:
        - "please"
        - "can you"
        - "could you"

This context defines five main banking intents:

checkBalance: Query account balances
transferFunds: Move money between accounts
payBill: Make bill payments
recentTransactions: View transaction history
accountInfo: Get account details

Download the .rhn model file, choosing your target platform.

You can refer to the Rhino Syntax Cheat Sheet for more details on how to build your custom context.

Install Python SDKs for Voice Banking Assistant

Install the following python SDKs using pip:

Porcupine Wake Word Python SDK pvporcupine,
Rhino Speech-to-Intent Python SDK pvrhino,
Orca Text-to-Speech Python SDK pvorca,
Picovoice Python Recorder library pvrecorder,
Picovoice Python Speaker library pvspeaker.

pip install pvporcupine pvrhino pvorca pvrecorder pvspeaker

Build the Voice Assistant Banking System

The following sections demonstrate the Python code that connects all components into a working voice banking system.

Initialize the Voice AI Models and Audio I/O

Import the SDKs and initialize each engine with your AccessKey:

import pvporcupine
import pvrhino
import pvorca
from pvrecorder import PvRecorder
from pvspeaker import PvSpeaker

ACCESS_KEY = "${ACCESS_KEY}"

# Path to your Porcupine wake-word model file (.ppn)
KEYWORD_PATH = "${KEYWORD_PATH}"

# Path to your Rhino context file (.rhn)
CONTEXT_PATH = "${CONTEXT_PATH}"

porcupine = pvporcupine.create(
    access_key=ACCESS_KEY,
    keyword_paths=[KEYWORD_PATH]
)
rhino = pvrhino.create(
    access_key=ACCESS_KEY,
    context_path=CONTEXT_PATH
)
orca = pvorca.create(access_key=ACCESS_KEY)

recorder = PvRecorder(device_index=-1, frame_length=porcupine.frame_length)
speaker = PvSpeaker(sample_rate=orca.sample_rate, bits_per_sample=16)

Detect the Custom Wake Word

Porcupine Wake Word listens continuously for your custom phrase. When detected, it triggers the voice assistant banking system:

pcm = recorder.read()
if porcupine.process(pcm) == 0:
    print("Wake word detected.")

Process Banking Voice Commands

Rhino Speech-to-Intent converts spoken commands into structured data your banking API can use:

is_finalized = rhino.process(pcm)
if is_finalized:
    inference = rhino.get_inference()

    if inference.is_understood:
        intent = inference.intent
        slots = inference.slots
        print(f"[UNDERSTOOD] intent={intent}, slots={slots}")
        response_text = execute_banking_action(intent, slots)
    else:
        print("[NOT UNDERSTOOD]")
        response_text = "Sorry, I did not understand."

Send Voice Commands to Banking API

This function routes the detected voice commands to the banking API:

def execute_banking_action(intent, slots):
    """
    Executes banking commands based on the detected intent.
    In production, this is where you'd integrate with the banking API.
    """
    if intent == "checkBalance":
        # Add the banking logic here
        account = slots.get("account")
        if account:
            response_text = f"Checking your {account} balance."
        else:
            response_text = "Checking your balance."
    
    elif intent == "transferFunds":
        # Add the banking logic here
        amount = slots.get("amount")
        from_account = slots.get("fromAccount")
        to_account = slots.get("toAccount")
        if from_account and to_account:
            response_text = f"Transferring {amount} dollars from {from_account} to {to_account}."
        elif to_account:
            response_text = f"Transferring {amount} dollars to {to_account}."
        else:
            response_text = f"Transferring {amount} dollars."
        # Add more banking actions 

    else:
        response_text = "Command received."

    return response_text

Generate Voice Responses

Orca Streaming Text-to-Speech converts text responses into natural-sounding audio:

from collections import deque

def speak_text(orca, speaker, text: str):
    pcm_out, _ = orca.synthesize(text)

    speaker.start()

    pcm_buffer = deque()
    pcm_buffer.append(pcm_out)

    while len(pcm_buffer) > 0:
        pcm = pcm_buffer.popleft()
        written = speaker.write(pcm)
        if written < len(pcm):
            pcm_buffer.appendleft(pcm[written:])

    speaker.flush()
    speaker.stop()

Complete Python Script for Voice Banking

Here's the full voice assistant banking implementation in Python:

# Banking Assistant: Porcupine + Rhino + Orca
import argparse
from collections import deque

import pvporcupine
import pvrhino
import pvorca
from pvrecorder import PvRecorder
from pvspeaker import PvSpeaker


def speak_text(orca, speaker, text: str):
    pcm_out, _ = orca.synthesize(text)

    speaker.start()

    pcm_buffer = deque()
    pcm_buffer.append(pcm_out)

    while len(pcm_buffer) > 0:
        pcm = pcm_buffer.popleft()
        written = speaker.write(pcm)
        if written < len(pcm):
            pcm_buffer.appendleft(pcm[written:])

    speaker.flush()
    speaker.stop()


def execute_banking_action(intent, slots):
    """
    Executes banking commands based on the detected intent.
    In production, this is where you'd integrate with the banking API.
    """
    if intent == "checkBalance":
        # Add the banking logic here
        account = slots.get("account")
        if account:
            response_text = f"Checking your {account} balance."
        else:
            response_text = "Checking your balance."

    elif intent == "transferFunds":
        # Add the banking logic here
        amount = slots.get("amount")
        from_account = slots.get("fromAccount")
        to_account = slots.get("toAccount")
        if from_account and to_account:
            response_text = (
                f"Transferring {amount} dollars from {from_account} "
                f"to {to_account}."
            )
        elif to_account:
            response_text = f"Transferring {amount} dollars to {to_account}."
        else:
            response_text = f"Transferring {amount} dollars."

    elif intent == "payBill":
        # Add the banking logic here
        payee = slots.get("payee")
        amount = slots.get("amount")
        if payee and amount:
            response_text = f"Paying {amount} dollars to {payee}."
        elif payee:
            response_text = f"Paying your {payee} bill."
        else:
            response_text = "Payment command received."

    elif intent == "recentTransactions":
        # Add the banking logic here
        account = slots.get("account")
        category = slots.get("category")
        if category:
            response_text = f"Showing spending on {category}."
        elif account:
            response_text = f"Showing recent {account} transactions."
        else:
            response_text = "Showing your recent transactions."

    elif intent == "accountInfo":
        # Add the banking logic here
        account = slots.get("account")
        if account:
            response_text = f"Retrieving your {account} account details."
        else:
            response_text = "Retrieving your account details."

    else:
        response_text = f"Command received. (raw intent={intent})"

    return response_text


def main():
    parser = argparse.ArgumentParser(
        description="Banking Assistant using Porcupine, Rhino, and Orca"
    )
    parser.add_argument(
        "--access_key",
        required=True,
        help="Picovoice access key"
    )
    parser.add_argument(
        "--keyword_file_path",
        required=True,
        help="Path to keyword file for wake word detection"
    )
    parser.add_argument(
        "--context_file_path",
        required=True,
        help="Path to context file for intent recognition"
    )
    parser.add_argument(
        "--audio_device_index_input",
        type=int,
        default=-1,
        help="Input audio device index (default: -1)"
    )
    parser.add_argument(
        "--audio_device_index_output",
        type=int,
        default=0,
        help="Output audio device index (default: 0)"
    )

    args = parser.parse_args()

    print("Initializing Banking Assistant")

    # Use command line arguments
    access_key = args.access_key
    keyword_path = args.keyword_file_path
    context_path = args.context_file_path
    input_device_index = args.audio_device_index_input
    output_device_index = args.audio_device_index_output

    # Initialize engines
    porcupine = pvporcupine.create(
        access_key=access_key,
        keyword_paths=[keyword_path]
    )
    rhino = pvrhino.create(
        access_key=access_key,
        context_path=context_path
    )
    orca = pvorca.create(access_key=access_key)

    speaker = PvSpeaker(
        sample_rate=orca.sample_rate,
        bits_per_sample=16,
        device_index=output_device_index
    )

    print("[OK] Engines initialized")
    print(
        f"Using input device index={input_device_index}, "
        f"output={output_device_index}"
    )
    print("Say the wake word… (Ctrl+C to stop)")

    try:
        while True:
            # Initialize recorder for wake word detection
            recorder = PvRecorder(
                device_index=input_device_index,
                frame_length=porcupine.frame_length
            )
            recorder.start()

            # Wake word detection loop
            while True:
                pcm = recorder.read()
                if porcupine.process(pcm) == 0:
                    print("[EVENT] Wake word detected!")
                    speak_text(orca, speaker, "Yes?")
                    break

            recorder.stop()
            recorder.delete()

            # Initialize recorder for intent recognition
            recorder = PvRecorder(
                device_index=input_device_index,
                frame_length=rhino.frame_length
            )
            recorder.start()

            # Intent processing loop
            while True:
                pcm = recorder.read()
                is_finalized = rhino.process(pcm)
                if is_finalized:
                    inference = rhino.get_inference()

                    if inference.is_understood:
                        intent = inference.intent
                        slots = inference.slots
                        print(
                            f"[UNDERSTOOD] intent={intent}, "
                            f"slots={slots}"
                        )
                        response_text = execute_banking_action(intent, slots)
                    else:
                        print("[NOT UNDERSTOOD]")
                        response_text = "Sorry, I did not understand."

                    # Speak the response
                    speak_text(orca, speaker, response_text)

                    # Reset Rhino for next turn
                    rhino.reset()
                    break

            recorder.stop()
            recorder.delete()

    except KeyboardInterrupt:
        print("\n[EXIT] Stopping assistant...")
    finally:
        speaker.delete()
        porcupine.delete()
        rhino.delete()
        orca.delete()
        print("[CLEANUP] Resources released.")


if __name__ == "__main__":
    main()

Run the Voice Banking System

Get your Picovoice AccessKey from the Picovoice Console. Replace ${ACCESS_KEY} with your AccessKey, and update the file paths to point to your local keyword and context files:

python banking_assistant.py \
    --access_key="${ACCESS_KEY}" \
    --keyword_file_path="${KEYWORD_PATH}" \
    --context_file_path="${CONTEXT_PATH}"

The voice activated banking assistant is now running. Test it with commands like "What's my checking balance?" or "Transfer $100 to savings."

Extend the Voice Banking Solution

Once the voice banking assistant is operational, consider these enhancements:

Multi-Language Support: Picovoice engines support a wide range of languages, allowing you to adapt the voice assistant for different regions.
Voice Biometric Authentication: Add speaker identification with Eagle Speaker Recognition to confirm customer identity before executing sensitive transactions.

Start building a voice banking solution with Picovoice!

Start Free