Voice recognition has flourished with the growth of cloud-based speech services. Despite the ubiquity of voice-enabled products, processing speech in the cloud has raised privacy concerns with the uploading and handling of personal voice data. Also, cloud speech recognition has fundamental limitations with cost-effectiveness, latency, and reliability.

Offline speech recognition has the potential to address cloud service drawbacks by eliminating the need for connectivity and tapping into readily available compute resources on billions of devices. The computational cost of speech recognition algorithms has made it impossible to get comparable accuracy on commodity edge devices.

Picovoice has developed deep learning technology specifically designed to perform large vocabulary speech-to-text efficiently on edge devices. Picovoice software runs on commodity hardware with constrained compute resources. Bespoke voice AI technology allows speech-to-text on even a Raspberry Pi, recognizing more than 300,000 words in real time. The Picovoice offline option lowers cost and latency while matching the accuracy of cloud voice services.

Accuracy

Picovoice has benchmarked the accuracy of its speech-to-text engine against widely-used speech-to-text APIs: Google Speech-to-Text, Amazon Transcribe, Azure Speech-to-Text, and IBM Watson Speech-to-Text. The detail of the benchmark with a link to the open-source repository is available on the Leopard Speech-to-Text benchmark page. The figure below shows that Picovoice achieves accuracy comparable to cloud-based services.

Comparison of Word Error Rate of Speech-to-Text Engines

Cost-Effectiveness

The figure below compares the operational cost of voice recognition engines. Picovoice's offering is not a fractional cost saving. It's an order of magnitude more cost-efficient than API-based offerings.

Comparison of Offline Speech-to-Text Engines

Start Building

Start building with Leopard speech-to-text for free.

leopard = pvleopard.create(access_key)
transcript, words = leopard.process_file(path)
Build with Python
const leopard = new Leopard(accessKey)
const { transcript, words } = leopard.processFile(path)
Build with NodeJS
Leopard leopard = new Leopard.Builder()
.setAccessKey(accessKey)
.setModelPath(modelPath)
.build(appContext);
LeopardTranscript result = leopard.processFile(path);
Build with Android
let leopard = Leopard(
accessKey: accessKey,
modelPath: modelPath)
let result = leopard.processFile(path)
Build with iOS
leopard = NewLeopard(accessKey)
err := leopard.Init()
transcript, words, err := leopard.ProcessFile(path)
Build with Go
Leopard leopard = new Leopard.Builder()
.setAccessKey(accessKey)
.build();
LeopardTranscript result = leopard.processFile(path);
Build with Java
Leopard leopard = Leopard.Create(accessKey);
LeopardTranscript result = leopard.ProcessFile(path);
Build with .NET
let leopard: Leopard = LeopardBuilder::new()
.access_key(access_key)
.init()
.expect("");
if let Ok(result) = leopard.process_file(path) { }
Build with Rust
Leopard leopard = await Leopard.create(
accessKey,
modelPath);
LeopardTranscript result = await leopard.processFile(path);
Build with Flutter
const leopard = await Leopard.create(
accessKey,
modelPath)
const {transcript, words} = await leopard.processFile(path)
Build with React Native
pv_leopard_t *leopard = NULL;
pv_leopard_init(
access_key,
model_path,
enable_automatic_punctuation,
&leopard);
char *transcript = NULL;
int32_t num_words = 0;
pv_word_t *words = NULL;
pv_leopard_process_file(
leopard,
path,
&transcript,
&num_words,
&words);
Build with C
const leopard = await LeopardWorker.fromPublicDirectory(
accessKey,
modelPath
);
const { transcript, words } = await leopard.process(pcm);
Build with Web