Picovoice WordmarkPicovoice Console
Introduction
Introduction
AndroidC.NETFlutterlink to GoiOSJavaNvidia JetsonLinuxmacOSNodejsPythonRaspberry PiReact NativeRustWebWindows
AndroidC.NETFlutterlink to GoiOSJavaNodejsPythonReact NativeRustWeb
SummaryPicovoice LeopardAmazon TranscribeAzure Speech-to-TextGoogle ASRGoogle ASR (Enhanced)IBM Watson Speech-to-Text
FAQ
Introduction
AndroidC.NETFlutterlink to GoiOSJavaNodejsPythonReact NativeRustWeb
AndroidC.NETFlutterlink to GoiOSJavaNodejsPythonReact NativeRustWeb
FAQ
Introduction
AndroidCiOSLinuxmacOSPythonWebWindows
AndroidCiOSPythonWeb
SummaryOctopus Speech-to-IndexGoogle Speech-to-TextMozilla DeepSpeech
FAQ
Introduction
AndroidAngularArduinoBeagleBoneCChrome.NETEdgeFirefoxFlutterlink to GoiOSJavaNvidia JetsonLinuxmacOSMicrocontrollerNodejsPythonRaspberry PiReactReact NativeRustSafariUnityVueWebWindows
AndroidAngularC.NETFlutterlink to GoiOSJavaMicrocontrollerNodejsPythonReactReact NativeRustUnityVueWeb
SummaryPorcupineSnowboyPocketSphinx
Wake Word TipsFAQ
Introduction
AndroidAngularBeagleBoneCChrome.NETEdgeFirefoxFlutterlink to GoiOSJavaNvidia JetsonlinuxmacOSNodejsPythonRaspberry PiReactReact NativeRustSafariUnityVueWebWindows
AndroidAngularC.NETFlutterlink to GoiOSJavaNodejsPythonReactReact NativeRustUnityVueWeb
SummaryPicovoice RhinoGoogle DialogflowAmazon LexIBM WatsonMicrosoft LUIS
Expression SyntaxFAQ
Introduction
AndroidBeagleboneCiOSNvidia JetsonLinuxmacOSPythonRaspberry PiRustWebWindows
AndroidCiOSPythonRustWeb
SummaryPicovoice CobraWebRTC VAD
FAQ
Introduction
AndroidAngularArduinoBeagleBoneC.NETFlutterlink to GoiOSJavaNvidia JetsonMicrocontrollerNodejsPythonRaspberry PiReactReact NativeRustUnityVueWeb
AndroidAngularCMicrocontroller.NETFlutterlink to GoiOSJavaNodejsPythonReactReact NativeRustUnityVueWeb
Picovoice SDK - FAQ
IntroductionSTM32F407G-DISC1 (Arm Cortex-M4)STM32F411E-DISCO (Arm Cortex-M4)STM32F769I-DISCO (Arm Cortex-M7)IMXRT1050-EVKB (Arm Cortex-M7)
FAQGlossary

Cheetah Speech-to-Text: Real-time Transcription FAQ


How do I convert audio to text in real-time?

Cheetah Speech-to-Text engine converts audio to text in real-time with high accuracy. It only takes a few lines of the code to start for free. Check out Picovoice Cheetah Speech-to-Text SDKs to get started.

What does WER stand for automatic speech recognition engines?

WER for speech-to-text engines stands for Word Error Rate. It’s a common metric to measure the accuracy performance of automatic speech recognition engines.

How do I measure the accuracy of automatic speech recognition engines?

WER is the common method used to measure the accuracy of automatic speech recognition engines. To compare various automatic speech recognition engines, one needs to use the same data set. The methodology for WER is explained in the Picovoice docs glossary. If you do not have a data set yet, you can use open-data sets, such as LibriSpeech test-clean, LibriSpeech test-other, Common Voice test and TED-LIUM test as Picovoice does for its open-source benchmarks.

What’s the accuracy of Cheetah Speech-to-Text?

Check out the open-source speech-to-text benchmark to compare it against major cloud providers’ automatic speech recognition APIs. Cheetah is more accurate than Google and IBM Watson ASRs.

How do I improve automatic speech recognition (ASR) accuracy?

There’s no 100% accurate automatic speech recognition (ASR) solution offered in the market yet, even human transcribers can make mistakes. Although every engine has to be evaluated individually, automatic speech recognition engines mostly struggle with proper names and homophones. The most common and easiest way to tackle it to improve automatic speech recognition engine accuracy is to add custom words or boost words. If the lexicon of an automatic speech recognition solution doesn't include a specific word, such as a brand name, then you should add that custom word. If it has it in the lexicon but does not always return it due to competing hypotheses such as "calluses" and "calculus", then boost one of them over the other depending on the use case.

How fast does Cheetah convert audio to text?

Cheetah Speech-to-Text processes voice data locally on-device, unlike cloud automatic speech recognition APIs. Hence it offers real real-time experience and converts audio to text with no latency.

Does Cheetah Speech-to-Text perform end-pointing?

Yes, it performs end-pointing automatically, also you can set endpoint duration manually. Check out the API of your choice to learn how to do it.

Can I voice type in Ubuntu with Cheetah?

Yes, Cheetah supports Linux and Linux-based systems such as Ubuntu to transcribe voice in real-time. Check out Cheetah docs to get started.

How do I run dictation on macOS?

Select your favourite Cheetah SDK and start with the Free Tier immediately.

How can I use Cheetah Speech-to-Text for hands-free typing on Windows?

Check out Cheetah SDKs to build a hands-free typing application for Windows with continuous speech recognition.

Can I use Cheetah instead of Web Speech API?

Yes! Check out Cheetah SDKs for real-time transcription that runs within modern web browsers including Chrome, Safari and Firefox.

Do you have a continuous speech recognition example for Android?

Check out Cheetah Android SDK for more information.

Can I use Cheetah for on-device speech recognition on iOS?

Yes, Cheetah enables on-device automatic speech recognition. Leopard can be also used for on-device speech recognition on iOS depending on the use case.

Can I build a continuous speech recognition system on a Raspberry Pi with Cheetah?

Yes, Cheetah supports Raspberry Pi 3 and 4 to convert voice data in real-time.

Can I use Cheetah to implement speech recognition on NVIDIA Jetson Nano?

Yes, Cheetah can be used for real-time transcription on NVIDIA Jetson Nano, if you’re looking for other applications, check out our strategy guide to learn more.

Can I use Cheetah to convert speech to text for free?

Yes, Cheetah can be used to convert speech to text in real-time and Leopard for audio files for both commercial and non-commercial projects under the Free Tier.

How do I evaluate streaming automatic speech recognition models?

Every use case has different requirements and levels of support. Check out our blog post on how to evaluate audio transcription engines.

Which languages does Cheetah support?

Cheetah Speech-to-Text only supports English for now. Reach out to Picovoice Sales to tell us about your commercial endeavour if you require support for additional languages. Don’t forget to add the use case, business requirements and project details. Picovoice team will respond to you.

Can I use Cheetah for telephone applications (in telephony)?

Yes. Cheetah can be used for telephone applications just like any other automatic speech recognition. Please note that Picovoice software only supports 16kHz audio, if your application requires 8kHz audio, contact Picovoice Sales.

Does Cheetah convert audio recordings to text?

Cheetah doesn’t, but Leopard does.

Was this doc helpful?

Issue with this doc?

Report a GitHub Issue
Platform
  • Leopard Speech-to-Text
  • Cheetah Streaming Speech-to-Text
  • Octopus Speech-to-Index
  • Porcupine Wake Word
  • Rhino Speech-to-Intent
  • Cobra Voice Activity Detection
Resources
  • Docs
  • Console
  • Blog
  • Demos
Sales
  • Pricing
  • Starter Tier
  • Enterprise
Company
  • Careers
Follow Picovoice
  • LinkedIn
  • GitHub
  • Twitter
  • Medium
  • YouTube
  • AngelList
Subscribe to our newsletter
Terms of Use
Privacy Policy
© 2019-2022 Picovoice Inc.