On-Premises, or On-Prem for short, speech-to-text refers to deploying and running a transcription engine within the enterprises’ own infrastructure, i.e., local or dedicated cloud environment. Unlike vendor-hosted speech-to-text APIs, which rely on remote servers belonging to 3rd parties and public internet connectivity, On-Prem speech-to-text offers control over data security, privacy, and network dependencies. As a result, On-Prem deployment allows enterprises to avoid third-party security and latency risks and to mitigate all risks internally.

Some cloud speech-to-text APIs and all On-Device speech-to-text engines allow On-Prem deployment. On-Device speech-to-text engines may run on more platforms. For example, Picovoice’s cross-platform On-Device speech-to-text engines Leopard Speech-to-Text and Cheetah Streaming Speech-to-Text support more platforms:

Leopard Speech-to-Text and Cheetah Streaming Speech-to-Text give enterprises control over their infrastructure and speech data to meet data residency and compliance requirements. They are available on Linux, Windows, and macOS through .NET, C, Go, Java, Node.js, Python, and Rust, making them a perfect choice for On-Prem deployment.

o = pvleopard.create(access_key)
transcript, words =
o.process_file(path)

Picovoice is the only company with production-ready speech-to-text models that can be deployed On-Prem under a Free Plan. Try it now!

Start Free

Developers have other options to deploy speech-to-text On-Prem:

Cloud speech-to-text APIs for on-prem deployment

Big Tech, such as Google (Cloud Speech-to-Text On-Prem) and Microsoft (Azure Cognitive Services Speech to Text), offer production-ready speech-to-text On-Prem or private cloud using containers with Kubernetes or Docker.

Cloud speech-to-text API providers offer On-Prem deployment as a private feature to “selected” enterprises. Reach out to the vendor of your choice for more information.

Open-source speech-to-text for on-prem deployment

Free and open-source speech-to-text models can also run On-Prem. Well-known speech-to-text engines, such as Kaldi, wav2vec 2.0, and Whisper, are a few examples.

Ensure you have the resources to build, customize, maintain, and improve open-source models before embedding them into mission-critical applications.