Speech-to-text for Online Meetings

🏢 Enterprise AI Consulting

Get dedicated help specific to your use case and for your hardware and software choices.

Online meetings enable remote work and doing business internationally without having a physical place. Now 58% of Americans, including those traditionally labelled “blue-collar workers,” work from home. The ratio goes up to 75% for jobs paying above $150,000. Yet, many executives are overwhelmed by meetings and looking for opportunities to decrease their volume and gather insights to improve productivity even further.

Ratio of jobs paying over $150K allow remote work
(McKinsey)

Time spent in a week for meetings vs 10H in 1960s
(HBR)

# of meetings attended by employees per week
(Atlassian)

Meeting Transcriptions improve comprehension and accessibility. Running NLP solutions on transcribed text improves productivity further by automating admin workload, such as writing meeting minutes or tracking action items. Despite the benefits, privacy, accuracy and affordability concerns limit the wider adoption.

Online meeting transcriptions should be private.

Confidential information, such as trade secrets or customer data, is shared during business meetings. Thus, handling recordings and transcriptions with caution is vital for business continuity.

Zoom, the popular online meeting platform, lets users store recordings locally or in the cloud. However, when it comes to Zoom Meeting Transcription, there is no local option. Enterprises may think they’re safe when recordings are stored locally or not recorded. However, they lose control over the transcribed text - they enable it whether for productivity or accessibility purposes. Otter, one of the most popular Zoom Meeting Transcription tools, has recently hit the headlines with privacy concerns a few times, e.g. in 2018 and 2022.

Microsoft Teams is another popular online meeting platform, especially among large enterprises. When it comes to Microsoft Teams Meeting Transcription, there is no local option either. Teams do not even allow recording meetings locally. Recordings are encoded in an Azure service and uploaded to Microsoft Stream. Not just Microsoft, Big Tech is already known for questionable privacy practices, antitrust violations and the use of customer data or information shared by startups to compete with them directly. Cloud-based Meeting Transcriptions risk the confidentiality of any sensitive information. Using local speech-to-text eliminates this risk. Thus, audio and video recordings stay within the premises.

Online meeting transcriptions should capture jargon.

The first few weeks at a new job feel awkward, as if you are in another country that speaks an unfamiliar language. Generic speech-to-text models “feel” the same. Every industry has its terminology, and the company has its acronyms. Despite the low Word Error Rate in everyday conversations, any generic speech-to-text model will struggle with the jargon.

Generic speech-to-text models, just like new hires, should be trained on specific phrases and terms for accurate Meeting Transcriptions. Some vendors, mostly big tech, do not allow fine-tuning or directly use the customer data without consent. Some vendors, primarily independent ones, require customers to send training data, so they overfit the model in the back end. Some vendors, like Picovoice, also offer a self-service platform in case users do not want to share their data.

The training process should be iterative, regardless of the approach vendors propose. New product launches, geo expansions, and customer acquisitions will require further model adjustments. In fast-paced environments, transcription solutions that allow quick and easy adjustments work better.

Online meeting transcription should be affordable.

Employees spend almost half of their time attending meetings. A 1000-person enterprise, having meetings with eight people on average, holds ~2900 hours of meetings every week. The cost of Meeting Transcriptions can go up to $300,000 per year with Google Speech-to-Text. However, local speech-to-text transcribes the same volume at a fraction of it.

What’s next?

Picovoice’s Free Plan allows developers to adapt speech-to-text models to their use case, train custom voice AI models, and process voice data on the device. get familiar with Rhino Speech-to-Intent before purchasing.

Start Building

Speech-to-text for Online Meetings

Online meeting transcriptions should be private.

Online meeting transcriptions should capture jargon.

Online meeting transcription should be affordable.

What’s next?

More from Picovoice