“Can we deploy On-Device voice AI models On-Prem or Cloud?” is one of the frequent questions the Picovoice team receives. The answer is YES! The follow-up question is “What do you recommend?” and the answer is “it depends.”

Note that Cloud-dependent models (private or public) cannot run On the Device. Most Voice AI vendors do not offer Cloud models On-Prem unless it’s an outrageously high-value contract.

Choosing where to deploy the AI models requires diligent research. We gathered five main categories to help with this vital decision:

  1. Reliability: How crucial is the voice feature for your product and business?
  2. Cost at Scale: What’s the estimated volume of data to be processed?
  3. Privacy & Security: How vital are data privacy and security for your business?
  4. Latency: Do latency and connectivity-related delays harm your business significantly?
  5. IT Management: Who maintains the infrastructure and how?

Case for the Cloud

Cloud processing requires voice data to be transferred to a 3rd party Cloud - whether for transcribing speech to text or detecting users’ intents.

  • Reliability: Network outages, poor signal or fluctuating internet connection cause Cloud-dependent products to stop working. That’s not a problem for PoCs or nice-to-have features, but it is for many use cases. In the winter, removing gloves when the voice assistant doesn't work is inconvenient. However, if you’re a beekeeper, lab worker or healthcare provider, it creates health and safety risks.
  • Cost at Scale: The cost of processing voice data in the Cloud does not matter much for low-volume use cases. However, the cost becomes significant at scale, just like other Cloud-dependent applications. It results in lower margins, hence the valuation.
  • Privacy & Security: Running AI models in the Cloud means sharing voice data with a 3rd party. Anything is safer when not shared. However, if you choose a Cloud API provider, do not simply assume they’ll be compliant. Make sure you question their policies and practices.
  • Unpredictable Latency: Latency is one of the inherent limitations of Cloud computing. Unreliable response time is not a dealbreaker for every application, such as batch audio transcription. However, for real-time applications such as AR/VR, fluctuating latency affects the user experience significantly.
  • IT Management: Cloud API providers maintain the platform that runs the AI models. So enterprises do not worry about maintenance, updates and upgrades.

DYK Cloud Repatriation is gaining popularity among large enterprises due to inherent limitations of cloud or real and hidden costs at scale?

Case for the On-Prem

On-Prem lies between the Cloud and On-Device options. On-Prem refers to the private Cloud managed by enterprises directly.

  • Reliability: On-Prem processing offers flexibility to run AI models in the same server or network where voice data is generated or resides. Data does not leave the “premises” and get transmitted to a 3rd party. Limiting the 3rd party involvement gives more control to enterprises. Thus, they can foresee and manage the performance.
  • Cost at Scale: Most voice vendors offer discounted prices for On-Prem deployment as vendors do not bear the platform costs but enterprises. However, On-Prem voice processing is not available to anyone. For example, unlike the standard self-service GCP model, one has to go through a traditional sales process to run Google Speech-to-Text On-Prem.
  • Privacy & Security: Running AI models On-Prem does not require sharing voice data with a 3rd party. Hence, enterprises have control over their data. For example, enterprises can transcribe speech data generated by a VOIP application On-Prem without sending voice recordings, i.e. personal data, to a 3rd party.
  • Unpredictable Latency: Like the connectivity requirement, latency depends on the design, and enterprises can control it.
  • IT Management: Enterprises are responsible for maintaining the servers and models.

Case for On-device

On-Device processing, also known as Edge Computing, allows enterprises to run AI models where data resides.

  • Reliability: On-Device processing does not require voice data to leave the device. Thus, the response time is predictable. Even if the internet connection is unstable for a moment, it does not affect the performance of the voice products.
  • Cost at Scale: Voice vendors offer discounted prices for On-Device deployment as they do not bear the platform maintenance cost. For example, Leopard Speech-to-Text is 20x more affordable than cloud providers.
  • Privacy & Security: With On-Device processing, voice data does not leave the device, i.e. platform where it resides, resulting in 100% privacy. For example, all demos on Picovoice’s website process voice data within the web browser. Hence, users have control over their data.
  • Unpredictable Latency: Fluctuating latency or Cloud outages do not cause delays as the data does not leave the premises. On-Device processing is the best solution for real-time applications, such as agent coaching or dictation.
  • IT Management: Whether public or private, the Cloud is known for the simplicity of management and scalability as one can allocate resources among various applications. However, when millions of users send data to one central location, the orchestration can be very expensive, and capping the latency can turn into a mission impossible. On-Device processing, like any distributed approach, leverages multiple computational resources at the edge.

Picovoice Consulting helps enterprises analyze their needs and choose the best platform to run AI models - whether it’s embedded, mobile, web, desktop, workstation, serverless, on-prem, private, or public cloud.

Consult an Expert