Can you imagine modern life without Google search? In the mid-1990s, the World Wide Web’s popularity created a deluge of novel text data. Google unlocked the power of this data with the first truly-effective web search engine. In the 2010s, multimedia content has seen a similar growth pattern driven by services like YouTube, SoundCloud, and Zoom. The ability to effectively index this new wave of Internet data unlocks opportunities in search, media creation, compliance, and real-time sentiment analysis.

The naive solution for the voice search problem is to use a speech-to-text (STT) engine combined with classic text indexing techniques. This approach has subtle—but significant—drawbacks.

Speech-to-Text Approach

This approach suffers from the STT reliance on a language model. The language model defines the set of valid words and how they are combined to build sentences. It limits the usability of STT engines for voice search, as they struggle to find out-of-vocabulary queries with technical jargon and proper nouns. Furthermore, mistakes in the transcription due to competing hypotheses result in search misses. Homophones like wear and where (or two, to, and too) are classic examples.

Picovoice Approach

Picovoice Octopus Speech-to-Index takes a unique approach: indexing speech directly without relying on a text representation. Octopus's acoustic-only indexing boosts accuracy by removing the out-of-vocabulary limitation and eliminating the problem of the competing hypothesis (e.g. homophones).

Octopus can index massive datasets multiple orders of magnitude faster than alternative solutions. Once the voice data is indexed, the search is lightning fast.

Live Demo

The demo below lets you index and search your voice or prerecorded audio files with Octopus.

Press the button
to start searching with Octopus


Picovoice provides an open-source framework for benchmarking Octopus against Google Speech-to-Text and Mozilla DeepSpeech. The Octopus Speech-to-Index benchmark page contains the detail of the benchmark and a link to the open-source repository.

Comparison Google Speech-to-Text and Mozilla DeepSpeech

Start Building

Start building with Octopus Speech-to-Index for free.

o = pvoctopus.create(access_key)
metadata =
matches =