Picovoice has recently announced its Speech-to-Text engines. Now, developers have access to voice recognition technology for all needs and that works across platforms without relying on the cloud. This is not just big news for Picovoice and its customers, but also for the field of speech recognition. Now we have been able to prove that cutting cloud-dependency is possible, even for complex models and all platforms without sacrificing accuracy.

How does cutting the cloud-dependency of voice AI help enterprises?

Cutting cloud-dependency of voice AI brings the control back to enterprises minimizing latency, security and privacy risks, and decreasing the carbon footprint. Cutting the cloud dependency doesn’t mean that the cloud cannot be used. In fact, it creates more opportunities, such as serverless Speech-to-Text. Using Picovoice Leopard with AWS Lambda costs 13.5x less than using AWS Transcribe. Cutting the dependency means making the use of cloud computing optional and fully controlled by enterprises since the cloud comes at a cost.

What’s the real cost of cloud computing?

The cloud has revolutionized the computing industry and enabled many applications, business models and enterprises, which otherwise wouldn’t have existed. Immediate availability, scalability, minimal capital expenditure and streamlined developer experience are the main benefits of the cloud— but they come at a cost.

  1. Cost at Scale: By moving from the public cloud Dropbox achieved $75M savings over two years and doubled its gross margin. Andreessen Horowitz recently conducted a study and found that cloud spending could go up as high as 80% of the cost of the revenue for software companies. Moreover, the $100B market value among 50 software companies is lost due to the impact of the cloud on their margins. In their concluding remarks, Andreessen Horowitz asked whether the oligopoly of cloud providers would give up either the margins (i.e. lower prices) or the workloads (i.e cloud repatriation). A couple of months later, Google Cloud announced its plans to increase prices.

  2. Security & Privacy: An IDC study shows that 98% of enterprises experienced at least one cloud data breach in the past 18 months. Moreover, regulations such as GDPR and privacy-sensitive users push enterprises to invest more in security and privacy.

  3. Environmental Cost: The carbon footprint of cloud computing now surpassed that of the airline industry. It’s not only data centre operations, even transmitting data to the cloud increases the carbon footprint of applications and enterprises.

For voice AI, cutting the cloud dependency was not easy. Voice recognition’s unquenchable thirst for computing doesn’t help with cloud dependency. The traditional approach was to feed AI models with more data for higher accuracy. Hence more compute power, which could be achieved by the cloud, was required to run them. It’s not surprising that cloud providers with access to a tremendous amount of data and compute power are also dominating the voice AI market. To cut voice AI’s cloud dependency, Amazon and Google have recently announced on-device voice processing to lower costs and better experience, but only for their products, not for other developers. Nuance and Mozilla DeepSpeech worked on cutting the connectivity dependency of voice processing, due to computing requirements and poor accuracy, they haven’t been widely adopted as an alternative. Mozilla stopped maintaining DeepSpeech, Nuance became a Microsoft. Before Picovoice, enterprises that don’t have the big tech resources were left with only one choice: Cloud.

After exploring innovative approaches, at Picovoice we have been able to prove that cutting cloud-dependency of speech recognition is possible. Local speech recognition with cloud-level accuracy is possible and can be accessible to everyone. By bringing speech recognition to local, closer to where data resides (on-prem server, desktop, mobile…) instead of data being sent to a 3rd party cloud, Picovoice gives the control back to enterprises by decreasing costs order of magnitude, minimizing latency, security and privacy risks, as well as the carbon footprint.