Cheetah Speech-to-Text: Real-time Transcription FAQ
How do I convert audio to text in real-time?
Cheetah Speech-to-Text engine converts audio to text in real-time with high accuracy. It only takes a few lines of the code to start for free. Check out Picovoice Cheetah Speech-to-Text SDKs to get started.
What does WER stand for automatic speech recognition engines?
WER for speech-to-text engines stands for Word Error Rate. It’s a common metric to measure the accuracy performance of automatic speech recognition engines.
How do I measure the accuracy of automatic speech recognition engines?
WER is the common method used to measure the accuracy of automatic speech recognition engines. To compare various automatic speech recognition engines, one needs to use the same data set. The methodology for WER is explained in the Picovoice docs glossary. If you do not have a data set yet, you can use open-data sets, such as LibriSpeech test-clean, LibriSpeech test-other, Common Voice test and TED-LIUM test as Picovoice does for its open-source benchmarks.
What’s the accuracy of Cheetah Speech-to-Text?
Check out the open-source speech-to-text benchmark to compare it against major cloud providers’ automatic speech recognition APIs. Cheetah is more accurate than Google and IBM Watson ASRs.
How do I improve automatic speech recognition (ASR) accuracy?
There’s no 100% accurate automatic speech recognition (ASR) solution offered in the market yet, even human transcribers can make mistakes. Although every engine has to be evaluated individually, automatic speech recognition engines mostly struggle with proper names and homophones. The most common and easiest way to tackle it to improve automatic speech recognition engine accuracy is to add custom words or boost words. If the lexicon of an automatic speech recognition solution doesn't include a specific word, such as a brand name, then you should add that custom word. If it has it in the lexicon but does not always return it due to competing hypotheses such as "calluses" and "calculus", then boost one of them over the other depending on the use case.
How fast does Cheetah convert audio to text?
Cheetah Speech-to-Text processes voice data locally on-device, unlike cloud automatic speech recognition APIs. Hence it offers real real-time experience and converts audio to text with no latency.
Does Cheetah Speech-to-Text perform end-pointing?
Yes, it performs end-pointing automatically, also you can set endpoint duration manually. Check out the API of your choice to learn how to do it.
Can I voice type in Ubuntu with Cheetah?
Yes, Cheetah supports Linux and Linux-based systems such as Ubuntu to transcribe voice in real-time. Check out Cheetah docs to get started.
How do I run dictation on macOS?
Select your favourite Cheetah SDK and start with the Free Plan immediately.
How can I use Cheetah Speech-to-Text for hands-free typing on Windows?
Check out Cheetah SDKs to build a hands-free typing application for Windows with continuous speech recognition.
Can I use Cheetah instead of Web Speech API?
Yes! Check out Cheetah SDKs for real-time transcription that runs within modern web browsers including Chrome, Safari and Firefox.
Do you have a continuous speech recognition example for Android?
Check out Cheetah Android SDK for more information.
Can I use Cheetah for on-device speech recognition on iOS?
Yes, Cheetah enables on-device automatic speech recognition. Leopard can be also used for on-device speech recognition on iOS depending on the use case.
Can I build a continuous speech recognition system on a Raspberry Pi with Cheetah?
Yes, Cheetah supports Raspberry Pi 3 and 4 to convert voice data in real-time.
Can I use Cheetah to implement speech recognition on NVIDIA Jetson Nano?
Yes, Cheetah can be used for real-time transcription on NVIDIA Jetson Nano, if you’re looking for other applications, check out our strategy guide to learn more.
Can I use Cheetah to convert speech to text for free?
Yes, Cheetah can be used to convert speech to text in real-time and Leopard for audio files for both commercial and non-commercial projects under the Free Plan.
How do I evaluate streaming automatic speech recognition models?
Every use case has different requirements and levels of support. Check out our blog post on how to evaluate audio transcription engines.
Which languages does Cheetah support?
Cheetah Speech-to-Text only supports English for now. Reach out to Picovoice Sales to tell us about your commercial endeavour if you require support for additional languages. Don’t forget to add the use case, business requirements and project details. Picovoice team will respond to you.
Can I use Cheetah for telephone applications (in telephony)?
Yes. Cheetah can be used for telephone applications just like any other automatic speech recognition. Please note that Picovoice software only supports 16kHz audio, if your application requires 8kHz audio, contact Picovoice Sales.
Does Cheetah convert audio recordings to text?
Cheetah doesn’t, but Leopard does.