Today, Picovoice is pleased to announce the public beta release of its Speaker Recognition engine, Eagle.
Speaker Recognition analyzes distinctive voice characteristics to identify and verify speakers. It is the technology behind voice authentication, speaker-based personalization, and speaker spotting, enabling use cases across industries:
- IVR personalization to tailor messages, menus, and treatments,
- Custom settings with wake word enrollment, e.g., Alexa Voice ID,
- Wake word false acceptance and false rejection minimization, e.g., Siri Recognize Only My Voice,
- Caller authentication in telephone banking, e.g., Barclays voice authentication,
- Speaker identification in virtual and hybrid meetings.
Speaker Recognition is a challenging technology to build, given the complex nature of the human voice. Anatomic and behavioral differences, such as the shapes of our mouths and throats, pitch, tone, and speaking patterns affect our voice characteristics. Acoustic environments, such as background noise, echo, or distance, and phonetic variability of languages and phrases add another layer of complexity.
Why another Speaker Recognition engine?
Picovoice initially built Eagle Speaker Recognition for internal use. Later, we decided to share it with the public because we couldn’t find a production-ready, accurate, and cross-platform Speaker Recognition engine. Not just legacy players, but also Big Tech require developers to go through a long sales process before revealing their SDKs. For example, Microsoft Speaker Recognition grants limited access in selected languages to “approved” customers only.
Meet Eagle Speaker Recognition
First and foremost, Eagle Speaker Recognition is available to any developer with Picovoice’s Free Plan. No need to talk to Picovoice. No vetting or approval process. No credit card is required. No machine learning expertise is needed. It’s ready in a few lines of code.
Eagle Speaker Recognition is:
- compact and computationally efficient, running across platforms:
- private and reliable, powered by on-device voice processing
- language-agnostic, text-independent
- easy to use with a simple enrollment process