Unlocking the voice data | Picovoice On-Device Voice Recognition

March 07, 2022
Blog Thumbnail

IDC predicts that new data generated globally will reach 175 ZB by 2025 compared to 64.2 ZB in 2020. More than 80% of 175 ZB will be unstructured including audio and video. [1] However, enterprises fail to use this unstructured or semi-structured data to achieve their goals without structuring them first. Gartner calls it “dark data”. [2] Picovoice’s private, reliable and affordable audio transcription and indexing technologies shed light on voice data.

Speech-to-Text 2.0: Better, Faster, Stronger

Improved accuracy with advances in artificial intelligence has been the main driver of the adoption of Automatic Speech Recognition (ASR) technology. Using machines to transcribe speech to text offers certain benefits over human transcription, such as cost, privacy, and fast turnover. For example, while the average cost for human transcription is around $1.5 per minute, Speech-to-Text (STT) has brought it down to less than $1.5 per hour on average. [3] However, given the millions of hours of data generated every month, even using machines can be a significant cost item. For example, Google Speech-to-Text Enhanced costs $2.16 per hour. [4] Furthermore, connectivity requirements and the controversial history of cloud providers do not offer full reliability and privacy.

Picovoice’s distinctive Speech-to-Text (STT) technology offers cloud-level accuracy with the benefits of edge computing: private by design, zero latency, and 10 times more affordable. With Picovoice STT, enterprises no longer need to compromise privacy or incur hefty cloud bills to derive value from their voice content.

Voice Search with no limits

Despite the advancements in artificial intelligence, there is no 100% accurate Speech-to-Text solution on the market. Even human transcription solution providers offer ~99% accuracy. Especially when it comes to homophones such as “matcha or much a” or special names such as “Hermes” which are pronounced differently across the globe, the Speech-to-Text solutions have difficulties. Speech indexing is a complementary solution to be used for these unique cases. Picovoice’s Speech-to-Index Engine, Octopus, indexes speech directly without relying on a text representation and makes even massive numbers of audio files searchable. Its acoustic-only approach boosts accuracy by removing the out-of-vocabulary limitation.

Picovoice’s free tier offers 100 hours of monthly transcription and/or voice search, while the starter tier costs $999/month for up to 10,000 hours of transcription and/or voice search. Learn more.