Octopus: Picovoice's Voice Search Engine

  • Speech-to-Index
  • Voice Search
  • Speech-to-Text
  • Speech Recognition
November 03, 2021

Can you imagine modern life without Google search? In the mid-1990s, the World Wide Web’s popularity created a deluge of novel text data. Google unlocked the power of this data with the first truly-effective web search engine. In the 2010s, multimedia content has seen a similar growth pattern driven by services like YouTube, SoundCloud, and Zoom. The ability to effectively index this new wave of Internet data unlocks opportunities in search, media creation, compliance, and real-time sentiment analysis.

The naive solution for the voice search problem is using a speech-to-text (STT) engine combined with classic text indexing techniques. This approach has subtle—but significant—drawbacks.

Speech-to-Text Approach

This approach suffers from the STT's reliance on a language model. The language model defines the set of valid words and how they are combined to build sentences. It limits the usability of STT engines for voice search, as they struggle to find out-of-vocabulary queries with technical jargon and proper nouns. Furthermore, mistakes in the transcription due to competing hypotheses result in search misses. Homophones like “wear” and “where” are classic examples.

Picovoice Approach

Picovoice Speech-to-Index takes a unique approach: indexing speech directly without relying on a text representation. Octopus's acoustic-only approach boosts accuracy by removing the out-of-vocabulary limitation and eliminating the problem of the competing hypothesis (e.g. homophones).

Picovoice’s efficient implementation of acoustic indexing technology and eliminating the need for a language model means that Octopus can index massive audio sets multiple orders of magnitude faster than alternative solutions. Once the voice data is indexed, the search is lightning fast.

Live Demo

Loading Demo...

Benchmark

An open-source framework for benchmarking different engines is made available on GitHub. Three different technologies were included: Google Speech-to-Text, Mozilla DeepSpeech, and Octopus. The figure below summarizes the comparison.

Comparison Google Speech-to-Text and Mozilla DeepSpeech

Start Building

Go to GitHub and start building with Octopus Speech-to-Index for free.