“Can we deploy On-Device
voice AI models On-Prem
or Cloud
?” is one of the frequent questions the Picovoice team receives. The answer is YES! The follow-up question is “What do you recommend?” and the answer is “it depends.”
Note that Cloud
-dependent models (private or public) cannot run On the Device
. Most Voice AI vendors do not offer Cloud
models On-Prem
unless it’s an outrageously high-value contract.
Choosing where to deploy the AI models requires diligent research. We gathered five main categories to help with this vital decision:
- Reliability: How crucial is the voice feature for your product and business?
- Cost at Scale: What’s the estimated volume of data to be processed?
- Privacy & Security: How vital are data privacy and security for your business?
- Latency: Do latency and connectivity-related delays harm your business significantly?
- IT Management: Who maintains the infrastructure and how?
Case for the Cloud
Cloud
processing requires voice data to be transferred to a 3rd party Cloud
- whether for transcribing speech to text or detecting users’ intents.
- Reliability: Network outages, poor signal or fluctuating internet connection cause
Cloud
-dependent products to stop working. That’s not a problem for PoCs or nice-to-have features, but it is for many use cases. In the winter, removing gloves when the voice assistant doesn't work is inconvenient. However, if you’re a beekeeper, lab worker or healthcare provider, it creates health and safety risks. - Cost at Scale: The cost of processing voice data in the
Cloud
does not matter much for low-volume use cases. However, the cost becomes significant at scale, just like otherCloud
-dependent applications. It results in lower margins, hence the valuation. - Privacy & Security: Running AI models in the
Cloud
means sharing voice data with a 3rd party. Anything is safer when not shared. However, if you choose aCloud
API provider, do not simply assume they’ll be compliant. Make sure you question their policies and practices. - Unpredictable Latency: Latency is one of the inherent limitations of
Cloud
computing. Unreliable response time is not a dealbreaker for every application, such as batch audio transcription. However, for real-time applications such as AR/VR, fluctuating latency affects the user experience significantly. - IT Management:
Cloud
API providers maintain the platform that runs the AI models. So enterprises do not worry about maintenance, updates and upgrades.
DYK Cloud Repatriation is gaining popularity among large enterprises due to inherent limitations of cloud or real and hidden costs at scale?
Case for the On-Prem
On-Prem
lies between the Cloud
and On-Device
options. On-Prem
refers to the private Cloud
managed by enterprises directly.
- Reliability:
On-Prem
processing offers flexibility to run AI models in the same server or network where voice data is generated or resides. Data does not leave the “premises” and get transmitted to a 3rd party. Limiting the 3rd party involvement gives more control to enterprises. Thus, they can foresee and manage the performance. - Cost at Scale: Most voice vendors offer discounted prices for
On-Prem
deployment as vendors do not bear the platform costs but enterprises. However,On-Prem
voice processing is not available to anyone. For example, unlike the standard self-service GCP model, one has to go through a traditional sales process to run Google Speech-to-TextOn-Prem
. - Privacy & Security: Running AI models
On-Prem
does not require sharing voice data with a 3rd party. Hence, enterprises have control over their data. For example, enterprises can transcribe speech data generated by a VOIP applicationOn-Prem
without sending voice recordings, i.e. personal data, to a 3rd party. - Unpredictable Latency: Like the connectivity requirement, latency depends on the design, and enterprises can control it.
- IT Management: Enterprises are responsible for maintaining the servers and models.
Case for On-device
On-Device
processing, also known as Edge Computing, allows enterprises to run AI models where data resides.
- Reliability:
On-Device
processing does not require voice data to leave the device. Thus, the response time is predictable. Even if the internet connection is unstable for a moment, it does not affect the performance of the voice products. - Cost at Scale: Voice vendors offer discounted prices for
On-Device
deployment as they do not bear the platform maintenance cost. For example, Leopard Speech-to-Text is 20x more affordable than cloud providers. - Privacy & Security: With
On-Device
processing, voice data does not leave the device, i.e. platform where it resides, resulting in 100% privacy. For example, all demos on Picovoice’s website process voice data within the web browser. Hence, users have control over their data. - Unpredictable Latency: Fluctuating latency or
Cloud
outages do not cause delays as the data does not leave the premises.On-Device
processing is the best solution for real-time applications, such as agent coaching or dictation. - IT Management: Whether public or private, the
Cloud
is known for the simplicity of management and scalability as one can allocate resources among various applications. However, when millions of users send data to one central location, the orchestration can be very expensive, and capping the latency can turn into a mission impossible.On-Device
processing, like any distributed approach, leverages multiple computational resources at the edge.
Picovoice Consulting helps enterprises analyze their needs and choose the best platform to run AI models - whether it’s embedded, mobile, web, desktop, workstation, serverless, on-prem, private, or public cloud.
Consult an Expert