“Can we deploy
On-Device voice AI models
Cloud?” is one of the frequent questions the Picovoice team receives. The answer is YES! The follow-up question is “What do you recommend?” and the answer is “it depends.”
Cloud-dependent models (private or public) cannot run
On the Device. Most Voice AI vendors do not offer
On-Prem unless it’s an outrageously high-value contract.
Choosing where to deploy the AI models requires diligent research. We gathered five main categories to help with this vital decision:
- Reliability: How crucial is the voice feature for your product and business?
- Cost at Scale: What’s the estimated volume of data to be processed?
- Privacy & Security: How vital are data privacy and security for your business?
- Latency: Do latency and connectivity-related delays harm your business significantly?
- IT Management: Who maintains the infrastructure and how?
Case for the Cloud
Cloud processing requires voice data to be transferred to a 3rd party
Cloud - whether for transcribing speech to text or detecting users’ intents.
- Reliability: Network outages, poor signal or fluctuating internet connection cause
Cloud-dependent products to stop working. That’s not a problem for PoCs or nice-to-have features, but it is for many use cases. In the winter, removing gloves when the voice assistant doesn't work is inconvenient. However, if you’re a beekeeper, lab worker or healthcare provider, it creates health and safety risks.
- Cost at Scale: The cost of processing voice data in the
Clouddoes not matter much for low-volume use cases. However, the cost becomes significant at scale, just like other
Cloud-dependent applications. It results in lower margins, hence the valuation.
- Privacy & Security: Running AI models in the
Cloudmeans sharing voice data with a 3rd party. Anything is safer when not shared. However, if you choose a
CloudAPI provider, do not simply assume they’ll be compliant. Make sure you question their policies and practices.
- Unpredictable Latency: Latency is one of the inherent limitations of
Cloudcomputing. Unreliable response time is not a dealbreaker for every application, such as batch audio transcription. However, for real-time applications such as AR/VR, fluctuating latency affects the user experience significantly.
- IT Management:
CloudAPI providers maintain the platform that runs the AI models. So enterprises do not worry about maintenance, updates and upgrades.
Case for the On-Prem
On-Prem lies between the
On-Prem refers to the private
Cloud managed by enterprises directly.
On-Premprocessing offers flexibility to run AI models in the same server or network where voice data is generated or resides. Data does not leave the “premises” and get transmitted to a 3rd party. Limiting the 3rd party involvement gives more control to enterprises. Thus, they can foresee and manage the performance.
- Cost at Scale: Most voice vendors offer discounted prices for
On-Premdeployment as vendors do not bear the platform costs but enterprises. However,
On-Premvoice processing is not available to anyone. For example, unlike the standard self-service GCP model, one has to go through a traditional sales process to run Google Speech-to-Text
- Privacy & Security: Running AI models
On-Premdoes not require sharing voice data with a 3rd party. Hence, enterprises have control over their data. For example, enterprises can transcribe speech data generated by a VOIP application
On-Premwithout sending voice recordings, i.e. personal data, to a 3rd party.
- Unpredictable Latency: Like the connectivity requirement, latency depends on the design, and enterprises can control it.
- IT Management: Enterprises are responsible for maintaining its server and the models.
Case for On-device
On-Device processing allows enterprises to run AI models where data resides.
On-Deviceprocessing does not require voice data to leave the device. Thus, the response time is predictable. Even if the internet connection is unstable for a moment, it does not affect the performance of the voice products.
- Cost at Scale: Voice vendors offer discounted prices for
On-Devicedeployment as they do not bear the platform maintenance cost. For example, Leopard Speech-to-Text is 20x more affordable than cloud providers.
- Privacy & Security: With
On-Deviceprocessing, voice data does not leave the device, i.e. platform where it resides, resulting in 100% privacy. For example, all demos on Picovoice’s website process voice data within the web browser. Hence, users have control over their data.
- Unpredictable Latency: Fluctuating latency or
Cloudoutages do not cause delays as the data does not leave the premises.
On-Deviceprocessing is the best solution for real-time applications, such as agent coaching or dictation.
- IT Management: Whether public or private, the
Cloudis known for the simplicity of management and scalability as one can allocate resources among various applications. However, when millions of users send data to one central location, the orchestration can be very expensive, and capping the latency can turn into a mission impossible.
On-Deviceprocessing, like any distributed approach, leverages multiple computational resources at the edge.