Voice Search: How to Find Spoken Keywords and Phrases in Audio Files

June 15, 2022
Blog Thumbnail

If you’re looking for a voice-enabled search to retrieve information, in other words, search by voice, check out this article.

“I have hundreds, thousands of audio files from meetings (lectures, news, call centre recordings, podcasts). Is there software to search particular words in these audio files?” The common answer to this question is transcribing audio files via a Speech-to-Text engine and then searching words or phrases within the text output. However, readers who tried this approach might have experienced some drawbacks. Speech-to-Text engines struggle to find out-of-vocabulary queries with technical jargon and proper nouns such as brand, product or individual names. For example, a Speech-to-Text engine would transcribe Toronto’s famous Yonge Street as Young Street.

If you’re looking for technology similar to the Google search engine that enables keyword search by crawling audio files instead of websites, then meet Octopus, Picovoice’s Speech-to-Index Engine that enables voice search.

If you’re looking for a voice-enabled search to retrieve information, in other words, search by voice, check out this article.

ZB
New data created globally in 2020
(IDC)
ZB
New data to be created in 2025
(IDC)
%
Percentage of unstructured data
(IDC)

Structuring unstructured audio and video data for monitoring, compliance and analysis will help enterprises minimize their risks and monetize this large data. We gathered three use cases below where acoustic-only voice search overperforms text-based search for audio files.

1. Social Media Listening: Today, people talk about brands on TikTok, Instagram, and YouTube videos more than they write about them. However, enterprises are still mainly focused on tracking written posts to protect their reputation and keep their competitive edge.

Check out OctoTube, Voice Search for YouTube Demo

Speech-to-Text engines that struggle with proper nouns (e.g. brand mentions) cannot serve this use case adequately. Social media management platforms or brands can try to customize Speech-to-Text models to capture missing names and improve accuracy. However, after every customization, they may need to re-transcribe everything in the archive, which makes this model expensive and not sustainable.

Check out OctoTube, Voice Search for YouTube Demo

2. Media and Entertainment: Audio and video content platforms such as streaming services, podcast or audiobook publishing services rely on limited text-based search based on the description instead of the rich audio and voice content they offer. For example, a user may want to find the show that has the quote “may the force be with you” go to the moment Judy Garland says “there is no place like home” within the Wizard of the Oz. These are only achievable with audio-based voice search.

Build your voice search engine with Octopus, for free

3. Archiving: Some things are meant to be preserved in audio and video formats, such as memories from company events, family dinners or birthday celebrations. While converting these files into text is not needed, making them audio-searchable by indexing saves time and even memories.

Build your voice search engine with Octopus, for free