Every person has unique and distinctive voice characteristics, similar to fingerprints. Voice Biometrics, a subset of Speaker Recognition, is the technology that uses these characteristics to verify individuals. In other words, Voice Biometrics uses voice data to check whether an individual is who they claim to be. Speaker Verification, Voice Authentication, and Voiceprinting are other terms that refer to Voice Biometrics.

Authentication relies on passwords, which users forget frequently. Voice Biometrics, i.e., Voice Authentication, is a convenient alternative as users do not need to memorize lengthy passwords or carry identification cards. A user can simply speak for authentication. Hence, call centers, mobile and online applications, and IoT devices offer it as a standalone solution or a part of a Multi-Factor Authentication (MFA):

  • Customer Care: Citibank uses Voiceprints to verify customers' identities when they call the bank.
  • Payment: Amazon Alexa allows users to make purchases with their voice, authenticating them using Voice Biometrics.
  • Automation: Google Nest Hub Max uses Voice Match to recognize the voices of different users and provide personalized content and access across Nest devices such as thermostats or smart locks.
  • Access: Monument Health in the Mayo Clinic network uses Voice Biometrics to authenticate healthcare providers when they access electronic health records.

How does Voice Biometrics work?

Voice Biometrics verifies a person by comparing their Voice Samples against an original Voice Template. Hence, the first step in Voice Biometrics is Enrollment, creating the original Voice Template. Enrollment and Voiceprint Extraction are interchangeable terms that refer to the same process.

  1. Voice Biometrics engines capture users’ speech data.
  2. Voice Biometrics engines process and analyze the data.
  3. Voice Biometrics engines create a unique Voiceprint, known as VoiceID and SpeakerID.
Image shows how the voice biometric enrollment process works: A user speaks, and Voice Biometrics Engine processes the user’s voice input to create a secure voiceprint

Voice Biometric Enrollment Process

The second step, Comparison, determines whether the new voice input belongs to the original speaker.

  1. Voice Biometrics engines capture users’ speech data.
  2. Voice Biometrics engines process data and compare it against the existing voiceprint (i.e., VoiceID).
  3. Voice Biometrics engines share a score. The higher the score, the more likely the sample belongs to the person they claim to be.

Please note some legacy Voice Biometrics engines may return a binary answer instead of a score. They allow developers to choose one of the pre-defined threshold (sensitivity) levels, and engines provide a positive response, such as “pass.” or a negative one, such as “fail.” The result for the end users may be the same. However, this legacy approach limits developers’ visibility and makes them dependent on the vendors to adjust threshold levels.

Image shows how the voice biometric comparison process works: Aser speaks, and Voice Biometrics Engine processes the user’s voice input and compares it with the original sample. If they match, the engine verifies the identity.

Voice Biometric Comparison Process

Interested in adding Voice Biometrics to your application? Start building for free!