On-Prem (short for On-Premises) speech-to-text refers to the deployment and running of a transcription engine within an enterprise’s own infrastructure, such as a local or dedicated cloud environment. On-Prem differs from vendor-hosted speech-to-text APIs since they offer control over data security, privacy, and network dependencies. The latter relies on remote servers belonging to third parties and public internet connectivity. As a result, On-Prem deployment allows enterprises to avoid third-party security and latency risks and mitigates all risks internally.

On-Prem deployment is supported by some cloud speech-to-text APIs and all On-Device speech-to-text engines. On-Device speech-to-text engines may run on more platforms than the cloud alternatives. For example, Picovoice’s cross-platform On-Device speech-to-text engines Leopard Speech-to-Text and Cheetah Streaming Speech-to-Text support more platforms, including:

Leopard Speech-to-Text and Cheetah Streaming Speech-to-Text give enterprises control over their infrastructure and speech data to meet data residency and compliance requirements. They are available on Linux, Windows, and macOS through .NET, C, Go, Java, Node.js, Python, and Rust, making them a perfect choice for On-Prem deployment.

o = pvleopard.create(access_key)
transcript, words =
o.process_file(path)

Picovoice is the only company with production-ready speech-to-text models that can be deployed On-Prem under a Free Plan. Try it now!

Start Free

Developers have other options to deploy speech-to-text On-Prem:

Cloud speech-to-text APIs for on-prem deployment

Big Tech, such as Google (Cloud Speech-to-Text On-Prem) and Microsoft (Azure Cognitive Services Speech to Text), offer production-ready speech-to-text On-Prem or private cloud using containers with Kubernetes or Docker.

Cloud speech-to-text API providers offer On-Prem deployment as a private feature to “selected” enterprises. Reach out to the vendor of your choice for more information.

Open-source speech-to-text for on-prem deployment

Free and open-source speech-to-text models can also run On-Prem. Well-known speech-to-text engines, such as Kaldi, wav2vec 2.0, and Whisper, are a few examples.

Ensure you have the resources to build, customize, maintain, and improve open-source models before embedding them into mission-critical applications.