Cheetah Streaming Speech-to-Text: Real-time Transcription FAQ

How do I convert audio to text in real time?

Cheetah Streaming Speech-to-Text engine converts audio to text in real time with high accuracy. It only takes a few lines of the code to start for free. Check out Picovoice Cheetah Streaming Speech-to-Text SDKs to get started.

What does WER stand for in terms of automatic speech recognition engines?

Word Error Rate (WER) is the ratio of errors in a transcript to the total words spoken. Despite its limitations, WER is the most commonly used metric to measure speech-to-text engine accuracy. A lower WER (lower number of errors) means better accuracy in recognizing speech. You can check Picovoice’s open-source speech-to-text benchmark to see an application of WER to compare the accuracy of speech-to-text engines.

How do I measure the accuracy of automatic speech recognition engines?

WER is the common method used to measure the accuracy of automatic speech recognition engines. To compare various automatic speech recognition engines, one needs to use the same data set. The methodology for WER is explained in the Picovoice docs glossary. If you do not have a data set yet, you can use open-data sets, such as LibriSpeech test-clean, LibriSpeech test-other, Common Voice test and TED-LIUM test as Picovoice does for its open-source benchmarks.

What’s the accuracy of Cheetah Streaming Speech-to-Text?

Picovoice built and open-sourced a speech-to-text benchmark to measure the accuracy of Cheetah Streaming Speech-to-Text and compare it with the major non-streaming automatic speech recognition engines in the market, as there is no competitive on-device streaming automatic speech recognition engine in the market.

How do I improve automatic speech recognition (ASR) accuracy?

There are multiple ways to improve Automatic Speech Recognition (ASR) accuracy, from Adding Custom Words, Boosting Phrases, and Language Model Adaptation to Acoustic Model Adaptation. The level of investment depends on the strategy and requirements. Adding Custom Words and Boosting Phrases on the self-service Picovoice Console does not require any coding experience.

Does Picovoice Cheetah Streaming Speech-to-Text perform end-pointing?

Yes, it performs end-pointing automatically. You can set endpoint duration manually. Check out the streaming speech-to-text API of your choice to learn how.

Can I enable voice typing on Ubuntu with Cheetah Streaming Speech-to-Text?

Yes, Cheetah Streaming Speech-to-Text enables voice typing on Linux and Linux-based systems such as Ubuntu and transcribes voice in real time in a few lines of code. Check out the Linux speech-to-text tutorial for details.

Can I run dictation on macOS?

Yes, Cheetah Streaming Speech-to-Text enables dictation on macOS and transcribes voice in real time in a few lines of code.

Can I use Cheetah Streaming Speech-to-Text for hands-free typing on Windows?

Yes, Cheetah Streaming Speech-to-Text enables hands-free typing on Windows and transcribes voice in real time in a few lines of code.

Can I use Cheetah Streaming Speech-to-Text instead of Web Speech API?

Yes, check out Cheetah Streaming Speech-to-Text Web SDK to replace Web Speech API in your application and run real-time transcription within modern web browsers including Chrome, Safari, and Firefox.

Do you have a continuous speech recognition example for Android?

Yes, check out Cheetah Streaming Speech-to-Text Android SDK or Cheetah Streaming Speech-to-Text Flutter SDK to start running continuous speech recognition on Android.

Can I use Cheetah for on-device speech recognition on iOS?

Yes, check out Cheetah Streaming Speech-to-Text iOS SDK or Cheetah Streaming Speech-to-Text Flutter SDK to start running continuous speech recognition on iOS.

Can I build a continuous speech recognition system on a Raspberry Pi with Cheetah?

Cheetah Streaming Speech-to-Text can be used to build a continuous speech recognition system that runs on a Raspberry Pi. Developers can use multiple SDKs, including .NET, C, and Python to add on-device speech recognition.

How do I evaluate streaming automatic speech recognition models?

There are several things to be considered while selecting a transcription engine, some are more important for certain use cases and some are not. Although audio transcription software fundamentally converts speech to text, each transcription engine has certain competitive advantages over others. For example, on-device transcription is a better fit for applications that are concerned about privacy, security, and compliance than cloud-dependent APIs.

Which languages does Cheetah support?

Cheetah Streaming Speech-to-Text currently supports English, French, German, Italian, Portuguese, and Spanish. Reach out to Picovoice Consulting to tell us about your commercial endeavour if you require support for additional languages. Don’t forget to add the use case, business requirements and project details. Picovoice team will respond to you.

Can I use Cheetah for telephone applications (in telephony)?

Yes. Cheetah Streaming Speech-to-Text can be used for telephone applications just like any other automatic speech recognition. Please note that publicly available Picovoice SDKs only support 16kHz audio. You can get access to a model that processes 8kHz audio when you become an Enterprise Plan customer.

Does Cheetah Streaming Speech-to-Text convert audio files to text?

Cheetah Streaming Speech-to-Text is good at real-time transcription while Leopard Speech-to-Text is a better fit for async transcription, i.e., converting audio files to text.

Was this doc helpful?

Issue with this doc?