On-device streaming speech-to-text with cloud-level accuracy without cloud latency
It felt like we tried every available solution on the market, and only Picovoice provided the stability, processing speed, excellent accuracy out of the box, and flexible training capabilities that we required. They are truly on the cutting edge of voice technology.
Cheetah Streaming Speech-to-Text is software that automatically transcribes voice data in real time without network delay or accuracy compromises.
Cheetah Streaming Speech-to-Text processes voice data locally, enabling live transcription on-device, mobile, web browsers, on-premise, or cloud.
o = pvcheetah.create(access_key)partial_transcript, is_endpoint =o.process(get_next_audio_frame())Build with Python
const o = new Cheetah(accessKey)const [partialTranscript, isEndpoint] =o.process(audioFrame);Build with NodeJS
Cheetah o = new Cheetah.Builder().setAccessKey(accessKey).setModelPath(modelPath).build(appContext);CheetahTranscript partialResult =o.process(getNextAudioFrame());Build with Android
let cheetah = Cheetah(accessKey: accessKey,modelPath: modelPath)let partialTranscript, isEndpoint =try cheetah.process(getNextAudioFrame())Build with iOS
o = NewCheetah(accessKey)err := cheetah.Init()partialTranscript, isEndpoint, err= o.Process(getNextFrameAudio())Build with Go
Cheetah o = new Cheetah.Builder().setAccessKey(accessKey).build();CheetahTranscript r =o.process(getNextAudioFrame());Build with Java
Cheetah o =Cheetah.Create(accessKey);CheetahTranscript partialResult =o.Process(GetNextAudioFrame());Build with .NET
let o: Cheetah =CheetahBuilder::new().access_key(access_key).init().unwrap();let cheetah_transcript =cheetah.process(&next_audio_frame()).unwrap()Build with Rust
_cheetah = await Cheetah.create(accessKey,modelPath);CheetahTranscript partialResult =await _cheetah.process(getAudioFrame());Build with Flutter
const cheetah = await Cheetah.create(accessKey,modelPath)const partialResult =await cheetah.process(getAudioFrame())Build with React Native
pv_cheetah_t *cheetah = NULL;pv_cheetah_init(access_key,model_file_path,endpoint_duration_sec,enable_automatic_punctuation,&cheetah);const int16_t *pcm = get_next_audio_frame();char *partial_transcript = NULL;bool is_endpoint = false;const pv_status_t status = pv_cheetah_process(cheetah,pcm,&partial_transcript,&is_endpoint);Build with C
const cheetah =await CheetahWorker.create(accessKey,(cheetahTranscript) => {// callback},{base64: cheetahParams,// orpublicPath: modelPath,});WebVoiceProcessor.subscribe(cheetah);Build with Web
Real-time transcription APIs send voice data to the vendor’s cloud, making them vulnerable to latency, congestion, outages, and throttling.
Cheetah Streaming Speech-to-Text processes voice data when and where received, resulting in a guaranteed real-time transcription experience without unpredictable delays.
Record, upload, and process voice data, then download text and show the transcript.
Cheetah Streaming Speech-to-Text brings cloud transcription API accuracy to any platform…
…in “real” real time by overcoming inherent cloud limitations with on-device voice recognition.
Compare the accuracy of transcription engines transparently. The open-source speech-to-text benchmark shows how Cheetah Streaming Speech-to-Text performs to the most popular transcription engines.
Improve the Cheetah Streaming Speech-to-Text accuracy further by adding application-specific vocabulary and boosting keywords on the no-code Picovoice Console platform.
Offer seamless real-time transcription experiences across platforms without worrying about future expansions. Cheetah Streaming Speech-to-Text processes voice data within web browsers, on devices, mobile apps, on-prem, and even in the public cloud.
Let your product reach its full potential without delay. Real-time transcription APIs send voice data to the vendor cloud, making it technically impossible to achieve on-device performance.
Be safe than sorry. Sharing users’ data with real-time transcription API providers risk their privacy and trust. The easiest way to comply with GDPR, CCPA, HIPAA, or any other existing or upcoming regulations and earn users’ trust is not to share.
Does Cheetah Streaming Speech-to-Text sound too good to be true? See for yourself!
Start NowReal-time transcription, also known as real-time speech-to-text, streaming transcription, streaming speech-to-text, live transcription, or live speech-to-text, refers to the technology and tools that convert audio streams to text synchronously with audio generation.
Real-time transcription APIs record and send voice data to vendor servers where the transcription engine resides to convert voice into text. On-device real-time transcription brings the transcription engine where voice data is, offering guaranteed real-time experience by eliminating unpredictable delays.
Cloud-based real-time transcription converts voice data into text with delay due to network latency and connectivity issues. On-device real-time transcription eliminates these inherent latency and reliability limitations by processing voice data on the device without sending it to a 3rd party cloud. For time-sensitive applications, such as agent assistance, medical dictation, or meeting transcription, delays affect the experience and productivity. A recent study on delays in virtual communication depicts internet lag as a wrench in mental gears .
Yes. You can run Cheetah Streaming Speech-to-Text in the cloud, whether private, public, or hybrid. Picovoice on-device voice recognition technology allows enterprises to decide where to run the transcription engine instead of making the Picovoice cloud mandatory for voice processing.
Key metrics for evaluating real-time transcription engines are latency, reliability & resiliency, accuracy, availability of features, the total cost of ownership, and data privacy and governance. Each metric may have different weights in different projects of the same company.
Cheetah Streaming Speech-to-Text only supports English for now.
Reach out to Picovoice Sales to tell us about your commercial endeavor.
Picovoice docs, blog, Medium posts , and GitHub are great resources to learn about voice AI, Picovoice technology, and how to enhance speech quality. Picovoice also offers GitHub community support to all Free Plan users.
Version changes appear in the Picovoice Newsletter , LinkedIn , and Twitter . Subscribing to GitHub is the best way to get notified of patch releases. If you enjoy building with Koala Noise Suppression, show it by giving a GitHub star!