Search engines, especially Google, have fundamentally changed how we access information - so much so, that Google has become a word added to prominent dictionaries. However, Google still indexes text for search. Thus, many developers look for workarounds to "Google search audio files."
Transcribing an audio file to text using Speech-to-Text
and typing the keyword to retrieve it from the transcript is the common workaround for the "Google search audio files" problem. Google News Initiative also takes this approach to make news searchable despite the known limitations of generic Speech-to-Text models, such as out-of-vocabulary and competing hypotheses (homophones). Customizing generic Speech-to-Text
to the domain is the best solution to overcome these challenges. However, the news is not domain-specific and includes many proper nouns, making it impossible to train a domain-specific Speech-to-Text
. Finding the best Speech-to-Text
for use cases with proper nouns gets even more challenging. Hence, this problem requires a different approach than an optimized and custom Speech-to-Text
model.
Speech-to-Index
is an alternative to Speech-to-Text
that approaches the search problem differently. It breaks the audio input into sounds (phonemes), just like Speech-to-Text
. However, Speech-to-Index
keeps them as they are and creates a phonetic-based index instead of converting them into text. Thus, Speech-to-Index
removes the limitations of a text-based approach, such as unrecognized words, homophones, and spelling errors, making it a better choice for audio search engines. For example, an open-source benchmark shows that an audio search engine built with Picovoice’s Octopus Speech-to-Index finds keywords in audio files ten times more accurately than the one with Google Speech-to-Text.

Figure: Speech-to-text alternative to creating content-based audio search engine
Speech-to-Index
is not a perfect substitution but a complementary for Speech-to-Text
. Speech-to-Text
is the only option for transcription, just like Speech-to-Index
is for search. Use cases like Dialogue Search, Social Listening, Legal E-Discovery, or Media Asset Management benefit from both Speech-to-Text
and Speech-to-Index
.
Octopus Speech-to-Index is now available to selected Enterprise Customers only. If you have a use case requiring an audio search, contact Picovoice's Consulting Services!