Picovoice has recently announced its Speech-to-Text engines. Now, developers have access to voice recognition technology for all needs and that works across platforms without relying on the cloud. This is not just big news for Picovoice and its customers, but also for the field of speech recognition. Now we have been able to prove that cutting cloud-dependency is possible, even for complex models and all platforms without sacrificing accuracy.
How does cutting the cloud-dependency of voice AI help enterprises?
Cutting cloud-dependency of voice AI brings the control back to enterprises minimizing latency, security and privacy risks, and decreasing the carbon footprint. Cutting the cloud dependency doesn’t mean that the cloud cannot be used. In fact, it creates more opportunities, such as serverless Speech-to-Text. Using Picovoice Leopard with AWS Lambda costs 13.5x less than using AWS Transcribe. Cutting the dependency means making the use of cloud computing optional and fully controlled by enterprises since the cloud comes at a cost.
picoLLM eliminates the cloud dependency of LLMs. Learn more about picoLLM to build compliant, reliable and cost-effective LLM applications.
What’s the real cost of cloud computing?
The cloud has revolutionized the computing industry and enabled many applications, business models and enterprises, which otherwise wouldn’t have existed. Immediate availability, scalability, minimal capital expenditure and streamlined developer experience are the main benefits of the cloud— but they come at a cost.
Cost at Scale: By moving from the public cloud Dropbox achieved $75M savings over two years and doubled its gross margin. Andreessen Horowitz recently conducted a study and found that cloud spending could go up as high as 80% of the cost of the revenue for software companies. Moreover, the $100B market value among 50 software companies is lost due to the impact of the cloud on their margins. In their concluding remarks, Andreessen Horowitz asked whether the oligopoly of cloud providers would give up either the margins (i.e. lower prices) or the workloads (i.e cloud repatriation). A couple of months later, Google Cloud announced its plans to increase prices.
Security & Privacy: An IDC study shows that 98% of enterprises experienced at least one cloud data breach in the past 18 months. Moreover, regulations such as GDPR and privacy-sensitive users push enterprises to invest more in security and privacy.
Environmental Cost: The carbon footprint of cloud computing now surpassed that of the airline industry. It’s not only data centre operations, even transmitting data to the cloud increases the carbon footprint of applications and enterprises.
Cutting cloud dependency for Voice AI was not easy. The traditional approach was to feed AI models with an abundance of data to improve accuracy - this required high levels of compute power and cloud dependency to accommodate them. Cloud providers with access to large amounts of data and compute power took advantage of this approach to dominate the Voice AI market. However, to offset the costs of cloud computing, Amazon and Google have recently announced on-device voice processing. The launch will lower costs and better use experiences, but only for Google and Amazon products - not for other developers. Nuance and Mozilla DeepSpeech worked to cut the connectivity dependency of voice processing, but computing requirements and poor accuracy issues have resulted in low adoption rates. Ultimately, Mozilla stopped maintaining DeepSpeech and Nuance was acquired by Microsoft. Before the launch of Picovoice speech-to-text engines, enterprises without access to Big Tech resources were stranded with only one option: cloud dependency.
After experimenting with innovative approaches, Picovoice has been able to develop Cheetah Streaming Speech-to-Text and Leopard Speech-to-Text - two speech-to-text engines that cut the cloud dependency of speech recognition. Local speech recognition with cloud-level accuracy is possible and now accessible to everyone through the Picovoice Console. Picovoice’s speech-to-text engines also provide offline features, including time stamps, diarization, word confidence, and more. By bringing speech recognition closer to where the data resides (such as on-prem server, desktop, or mobile) instead of a 3rd party cloud, Picovoice brings control back to enterprises. Ultimately, developers experience minimized latency, security and privacy risks and carbon footprints.
Try Picovoice’s on-device speech-to-text engines today! Read how to implement Leopard (for asynchronous transcription) speech-to-text using Javascript, Node.js, React.js, and Python or Cheetah (for real-time transcription) with Node.js, Python, or Javascript. Both are also compatible with iOS, Android, React Native, and Linux.