🎯 Voice AI Consulting
Get dedicated support and consultation to ensure your specific needs are met.
Consult an AI Expert

Search engines, especially Google, have fundamentally changed how we access information - so much so, that Google has become a word added to prominent dictionaries. However, Google still indexes text for search. Thus, many developers look for workarounds to "Google search audio files."

Transcribing an audio file to text using Speech-to-Text and typing the keyword to retrieve it from the transcript is the common workaround for the "Google search audio files" problem. Google News Initiative also takes this approach to make news searchable despite the known limitations of generic Speech-to-Text models, such as out-of-vocabulary and competing hypotheses (homophones). Customizing generic Speech-to-Text to the domain is the best solution to overcome these challenges. However, the news is not domain-specific and includes many proper nouns, making it impossible to train a domain-specific Speech-to-Text. Finding the best Speech-to-Text for use cases with proper nouns gets even more challenging. Hence, this problem requires a different approach than an optimized and custom Speech-to-Text model.

Speech-to-Index is an alternative to Speech-to-Text that approaches the search problem differently. It breaks the audio input into sounds (phonemes), just like Speech-to-Text. However, Speech-to-Index keeps them as they are and creates a phonetic-based index instead of converting them into text. Thus, Speech-to-Index removes the limitations of a text-based approach, such as unrecognized words, homophones, and spelling errors, making it a better choice for audio search engines. For example, an open-source benchmark shows that an audio search engine built with Picovoice’s Octopus Speech-to-Index finds keywords in audio files ten times more accurately than the one with Google Speech-to-Text.

Speech recognition for audio search breaks the audio into sounds (phonemes) and creates a phonetic-based index.

Figure: Speech-to-text alternative to creating content-based audio search engine

Speech-to-Index is not a perfect substitution but a complementary for Speech-to-Text. Speech-to-Text is the only option for transcription, just like Speech-to-Index is for search. Use cases like Dialogue Search, Social Listening, Legal E-Discovery, or Media Asset Management benefit from both Speech-to-Text and Speech-to-Index.

Octopus Speech-to-Index is now available to selected Enterprise Customers only. If you have a use case requiring an audio search, contact Picovoice's Consulting Services!

Consult an Expert