Can you imagine modern life without Google search? In the mid-1990s, the World Wide Web’s popularity created a deluge of novel text data. Google unlocked the power of this data with the first truly-effective web search engine. In the 2010s, multimedia content has seen a similar growth pattern driven by services like YouTube, SoundCloud, and Zoom. The ability to effectively index this new wave of Internet data unlocks opportunities in search, media creation, compliance, and real-time sentiment analysis.
The naive solution for the voice search problem is to use a speech-to-text (STT) engine combined with classic text indexing techniques. This approach has subtle—but significant—drawbacks.
Speech-to-Text Approach
This approach suffers from the STT reliance on a language model. The language model defines the set of valid words and
how they are combined to build sentences. It limits the usability of STT engines for voice search, as they struggle to
find out-of-vocabulary queries with technical jargon and proper nouns. Furthermore, mistakes in the transcription due to
competing hypotheses result in search misses. Homophones like wear
and where
(or two
, to
, and too
) are classic examples.
Picovoice Approach
Picovoice Octopus Speech-to-Index takes a unique approach: indexing speech directly without relying on a text representation. Octopus's acoustic-only indexing boosts accuracy by removing the out-of-vocabulary limitation and eliminating the problem of the competing hypothesis (e.g. homophones).
Octopus can index massive datasets multiple orders of magnitude faster than alternative solutions. Once the voice data is indexed, the search is lightning fast.
Live Demo
The demo below lets you index and search your voice or prerecorded audio files with Octopus.
Benchmark
Picovoice provides an open-source framework for benchmarking Octopus against Google Speech-to-Text and Mozilla DeepSpeech. The Octopus Speech-to-Index benchmark page contains the detail of the benchmark and a link to the open-source repository.
Start Building
Start building with Octopus Speech-to-Index for free.
o = pvoctopus.create(access_key)metadata =o.index_audio_file(path)matches = o.search(metadata,phrases)