The podcast industry is growing. A recent study estimates that the number of listeners in the US was above 162 million in 2021.  According to Podtrac, the leading publisher, iHeartRadio, has already reached 32 million unique listeners in Dec 2021 with 615 shows.  Chartable forecasts that more branded podcasts such as Inside Trader Joe’s which became the number one food podcast in the US on Apple Podcast will be published.  The growth comes with competition to get listeners’ attention. However, the unstructured nature of audio data hinders publishers’ ability to use it further for analysis to increase engagement or monetization. Picovoice engines for audio transcription and indexing help publishers uncover the value of unstructured voice data.
Cost-effective Audio Transcription with Leopard
The standard and the most known approach to structure voice data is to transform voice data into text data. With the advancements in technology, as the accuracy of AI models improves, most organizations choose machine transcription over human transcription. However, even using machines can be a significant cost item. For example, Google Speech-to-Text Enhanced costs $2.16 per hour.  Given the number of shows and episodes, this could easily become a huge cost burden on publishers. Along with other benefits, edge computing provides significant cost-effectiveness over cloud computing.  Picovoice’s Speech-to-Text engine Leopard, just like other Picovoice engines, offers cloud-level accuracy on the edge and is 10 to 20 times cheaper than cloud-based Speech-to-Text solutions.
Limitless Audio Search with Octopus
Despite the advancements in voice recognition, there is no 100% accurate Speech-to-Text solution on the market. Especially when it comes to homophones such as “to, too or too” and “botox or boat ox” or brand and individual names such as “Hermes” and “Khaleesi”, the Speech-to-Text solutions have difficulties. Speech indexing is a complementary solution to be used for these unique cases. Picovoice’s Speech-to-Index Engine, Octopus, indexes speech directly without relying on a text representation and makes even massive numbers of audio files searchable. Its acoustic-only approach boosts accuracy by removing the out-of-vocabulary limitation.