Using Speech to Text
in voice assistants is the common approach. The Conventional Spoken Language Understanding method transcribes speech data with Speech to Text
and extracts meaning, i.e., intent, by processing the transcribed text with Natural Language Understanding
. The accuracy of these voice assistants relies on the performances of independently trained Speech to Text
and Natural Language Understanding
modules. Erroneous Speech to Text
outputs lead to incorrect Natural Language Understanding
predictions, and we know generic Speech to Text models have limitations. Thus, finding the best Speech to Text
for voice assistants is challenging.
Open-domain voice assistants, such as Alexa, Siri, and Google Assistant, use the Conventional Spoken Language Understanding
approach. One of the reasons is the dataset availability. Open-domain voice assistants multitask, from retrieving information about weather, nutrition, or history to taking action to set a timer and play fun sounds and songs. Text-based Natural Language Understanding
has been around longer than speech-based Natural Language Understanding
, hence has richer datasets, making it a more suitable solution. However, every voice assistant doesn’t operate in the open domain. Technicians at an auto shop don’t need to interact with the voice assistant to fix an airplane or ship, let alone get their favorite song played.
We expect technicians to fix a “carburetor” and chefs to fix “carbonara.” So they get trained accordingly. Voice assistants should also be trained to improve productivity and minimize errors. Otherwise, their value-add, hence adoption, would be limited. Yet, the go-to speech technology for many developers, generic Speech to Text
, does not have this specialization, i.e., context awareness. Alexa can explain how to fix both “carburetor” and “carbonara.” but sometimes can mix the terms as they may sound similar and bring information from unreliable resources. It might not be an issue for someone asking questions for fun at home. However, time, accuracy, and precision are valuable in auto shops or commercial kitchens.
After listening to the challenges in the market to find the best Speech to Text
for voice assistants, we decided to take a different path and build a solution that addresses the need more directly: Speech-to-Intent
. Speech-to-Intent
is a context-aware alternative to Speech to Text
. It combines Speech to Text
and Natural Language Understanding
, resulting in more accurate and faster voice assistants in the domain. The downside? It’s not fit for open-domain voice assistants. A voice assistant specializing in auto repairs is a “professional” technician helper only. It’s not a technician, singer, nutritionist, timer, or door opener at once.
If you need a more accurate and responsive alternative to Speech to Text
for your domain-specific voice assistant, try Rhino Speech-to-Intent. Rhino Speech-to-Intent is six times more accurate than Big Tech alternatives - Google Dialogflow, Amazon Lex, Microsoft LUIS, and IBM Watson - proven by an open-source benchmark. If you need to discuss your specific use case with an expert, leverage Picovoice’s Consulting Services.