Cheetah Streaming Speech-to-Text

Build “real” real-time transcription software.

On-device streaming speech-to-text with cloud-level accuracy without cloud latency

Press the button
to start transcribing with Cheetah

Loved by developers, trusted by enterprises

It felt like we tried every available solution on the market, and only Picovoice provided the stability, processing speed, excellent accuracy out of the box, and flexible training capabilities that we required. They are truly on the cutting edge of voice technology.

Jocelyn Kang

CTO,

Knowtex

What is Cheetah Streaming Speech-to-Text?

Cheetah Streaming Speech-to-Text is software that automatically transcribes voice data in real time without network delay or compromising accuracy.

Cheetah Streaming Speech-to-Text processes voice data locally, enabling live transcription on-device, mobile, web browsers, on-premise, or cloud.

Build with cross-platform transcription SDKs

o = pvcheetah.create(access_key)

partial_transcript, is_endpoint =
  o.process(get_next_audio_frame())
Build with Python

Why Cheetah Streaming Speech-to-Text?

Real-time transcription APIs send voice data to the vendor’s cloud, making them vulnerable to latency, congestion, outages, and throttling.

Cheetah Streaming Speech-to-Text processes voice data when and where received, resulting in a guaranteed real-time transcription experience without unpredictable delays.

Zero latency real-time transcription

Cloud Performance

✅
Accurate
🎚
Custom models
🤸
Platform-agnostic

…with on-device benefits

⏱️
Zero latency
⚡
No downtime
🔒
Private by design

Accuracy backed by open-source benchmark

Evaluate the accuracy of transcription software transparently

Compare the accuracy of transcription engines transparently. The open-source speech-to-text benchmark shows how Cheetah Streaming Speech-to-Text performs compared to the most popular transcription engines.

Improved accuracy with model adaptation

Boost the accuracy of Cheetah Streaming Speech-to-Text with customization

Improve the Cheetah Streaming Speech-to-Text accuracy further by adding application-specific vocabulary and boosting keywords on the no-code Picovoice Console platform.

Speech-to-text APIs transfer voice input to the cloud to transcribe it into text, creating privacy, and reliability issues and additional costs.

Real-time transcription on any platform

Deploy Cheetah Streaming Speech-to-Text on any platform

Offer seamless real-time transcription experiences across platforms without worrying about future expansions. Cheetah Streaming Speech-to-Text processes voice data within web browsers, on devices, mobile apps, on-prem, and even in the public cloud.

Guaranteed response time

Generate real-time transcripts with no network delays

Let your product reach its full potential without delay. Real-time transcription APIs send voice data to the vendor cloud, making it technically impossible to achieve on-device performance.

Design with privacy in mind

Ensure voice data and transcript privacy and security

Better safe than sorry. Sharing users’ data with real-time transcription API providers jeopardizes user privacy and trust. The easiest way to comply with GDPR, CCPA, HIPAA, or any other existing or upcoming regulations and earn users’ trust is not to share.

Get started with

Cheetah Streaming Speech-to-Text

Does Cheetah Streaming Speech-to-Text sound too good to be true? See for yourself!

Start Free

Pre-trained transcription models
Custom vocabulary
Keyword boosting
Intuitive SDKs
Trucasing and Punctuation
English, French, German, Italian, Portuguese, and Spanish

Everything You Need to Know About Speech-to-Speech Translation

On-device voice AI for French to build AI Agents

How do Voice AI Agents work?

On-device AI Models to Convert Voice to Text in Spanish

Multilingual On-device Speech-to-Text for Real-time Applications

AI Voice Assistant for iOS Powered by Local LLM

FAQ

What is a real-time transcription engine?

Real-time transcription, also known as real-time speech-to-text, streaming transcription, streaming speech-to-text, live transcription, or live speech-to-text, refers to the technology and tools that convert audio streams to text synchronously with audio generation.

How does on-device real-time transcription differ from cloud-based real-time transcription APIs?

Cloud-based real-time transcription APIs record and send voice data to vendor servers where the transcription engine resides to convert voice into text. On-device real-time transcription brings the transcription engine where voice data is, offering guaranteed real-time experience by eliminating unpredictable delays.

What are the benefits of on-device real-time transcription over cloud-based real-time transcription?

Cloud-based real-time transcription converts voice data into text with delay due to network latency and connectivity issues. On-device real-time transcription eliminates these inherent latency and reliability limitations by processing voice data on the device without sending it to a 3rd party cloud. For time-sensitive applications, such as agent assistance, medical dictation, or meeting transcription, delays affect the experience and productivity. A recent study on delays in virtual communication depicts internet lag as a wrench in mental gears.

Can I use Cheetah Streaming Speech-to-Text in the cloud?

Yes. You can run Cheetah Streaming Speech-to-Text in the cloud, whether private, public, or hybrid. Picovoice on-device voice recognition technology allows enterprises to decide where to run the transcription engine instead of making the Picovoice cloud mandatory for voice processing.

What are the key metrics for evaluating real-time transcription engines?

Key metrics for evaluating real-time transcription engines are latency, reliability & resiliency, accuracy, availability of features, the total cost of ownership, and data privacy and governance. Each metric may have different weights in different projects of the same company.

Which platforms does Cheetah Streaming Speech-to-Text support?

Desktop and Servers: Linux, macOS, and Windows
Web Browsers: Chrome, Safari, Edge, and Firefox
Mobile Devices: Android and iOS
Single Board Computers: Raspberry Pi

Which languages does Cheetah Streaming Speech-to-Text support?

Cheetah Streaming Speech-to-Text currently supports English, French, German, Italian, Portuguese, and Spanish.

What should I do to request Cheetah Streaming Speech-to-Text to support other languages?

Reach out to Picovoice Sales to tell us about your commercial endeavor.

How do I get technical support for Cheetah Streaming Speech-to-Text?

Picovoice docs, blog, Medium posts, and GitHub are great resources to learn about voice AI, Picovoice technology, and how to start building transcription products. Enterprise customers get dedicated support specific to their applications from Picovoice Product & Engineering teams. While Picovoice customers reach out to their contacts, prospects can also purchase Enterprise Support before committing to any paid plan.

How can I get informed about updates and upgrades?

Version changes appear in the and LinkedIn. Subscribing to GitHub is the best way to get notified of patch releases. If you enjoy building with Cheetah Streaming Speech-to-Text, show it by giving a GitHub star!

Build “real” real-time transcription software.

What is Cheetah Streaming Speech-to-Text?

Build with cross-platform transcription SDKs

Why Cheetah Streaming Speech-to-Text?

Zero latency real-time transcription

Cloud Performance

…with on-device benefits

Evaluate the accuracy of transcription software transparently

Boost the accuracy of Cheetah Streaming Speech-to-Text with customization

Deploy Cheetah Streaming Speech-to-Text on any platform

Generate real-time transcripts with no network delays

Ensure voice data and transcript privacy and security

Cheetah Streaming Speech-to-Text

More from Picovoice

Everything You Need to Know About Speech-to-Speech Translation

On-device voice AI for French to build AI Agents

How do Voice AI Agents work?

On-device AI Models to Convert Voice to Text in Spanish

Multilingual On-device Speech-to-Text for Real-time Applications

AI Voice Assistant for iOS Powered by Local LLM

FAQ