Search engines, especially Google, have fundamentally changed how we access information - so much so, that Google has become a word added to prominent dictionaries. However, Google still indexes text for search. Thus, many developers look for workarounds to “Google search audio files.”
Transcribing an audio file to text using Speech to Text
and typing the keyword to retrieve it from the transcript is the common workaround for the “Google search audio files” problem. Google News Initiative also takes this approach to make news searchable despite the known limitations of generic Speech to Text models, such as out-of-vocabulary and competing hypotheses (homophones). Customizing generic Speech to Text
to the domain is the best solution to overcome these challenges. However, the news is not domain-specific and includes many proper nouns, making it impossible to train a domain-specific Speech to Text
. Finding the best Speech to Text
for use cases with proper nouns gets even more challenging. Hence, this problem requires a different approach than an optimized and custom Speech to Text
model.
Speech-to-Index
is an alternative to Speech to Text
that approaches the search problem differently. It breaks the audio input into sounds (phonemes), just like Speech to Text
. However, Speech-to-Index
keeps them as they are and creates a phonetic-based index instead of converting them into text. Thus, Speech-to-Index
removes the limitations of a text-based approach, such as unrecognized words, homophones, and spelling errors, making it a better choice for audio search engines. For example, an open-source benchmark shows that an audio search engine built with Picovoice’s Octopus Speech-to-Index finds keywords in audio files ten times more accurately than the one with Google Speech-to-Text.
Speech-to-Index
is not a perfect substitution but a complementary for Speech to Text
. Speech to Text
is the only option for transcription, just like Speech-to-Index
is for search. Use cases like Dialogue Search, Social Listening, Legal E-Discovery, or Media Asset Management benefit from both Speech to Text
and Speech-to-Index
. If you’re unsure how Speech-to-Index
can improve the performance of your product, start building with Picovoice’s Free Plan. Getting Octopus Speech-to-Index integrated into your product only takes a few lines of code and minutes! If you’re unsure where and how to start, you can work with experts leveraging Picovoice’s Consulting Services!