The pandemic has overwhelmed the healthcare system. Even before the pandemic, physician burnout was a problem, and administrative tasks, i.e. paperwork, have been the primary cause. Medical Dictation
, addressing this problem, is one of the widely adopted speech solutions. Yet, recent advances make choosing and building Medical Dictation
software challenging. Should you go with Nuance and take no risk or other Big Tech companies? How about innovative startups or open-source?
To help you answer this question, let’s look at what a Medical Dictation
solution requires: HIPAA Compliance
, High Accuracy
and Fast Response Time
.
HIPAA Compliance
First and foremost, Medical Dictation
should be HIPAA-compliant
. Enterprises using the cloud speech APIs need to ensure
- Transmission & Infrastructure Security: The voice data should be encrypted, transferred securely and stored on secure servers.
- Confidentiality: Anyone handling or accessing medical data should know how to work with PHI (Protected Health Information) and PII (Personally identifiable information)
- Geo-Location And Geo-Fencing: Regulations may require voice data to be stored and processed in a specific geographic location.
There are several privacy-related questions enterprises should ask the automated transcription API providers. Some cloud providers offer HIPAA Compliance
. For example, Google requires clients to talk to their account managers to execute a Business Associate Agreement (BAA) and not share their data with Google for training purposes. Not sharing the data with Google costs enterprises 50% more than using Google Speech-to-Text API and letting Google use their data. Google leaves the responsibility of building and executing HIPAA-compliant
solutions to the clients. AWS also offers HIPAA-eligible
transcription API, Amazon Transcribe Medical. AWS charges three times more for Amazon Transcribe Medical than the standard Amazon Transcribe models.
On the other hand, on-device automated transcription solutions address privacy and compliance needs and concerns by design. Voice data doesn’t leave the device. Thus, it’s not transmitted to a 3rd party server or stored. Nobody accesses data other than the owner.
Custom Models for High Accuracy
After addressing compliance needs, on-device automated transcription solutions and selected cloud transcription APIs remain as options for Medical Dictation
. Open-source speech recognition models are not highly Accurate
for industry-specific jargon out of the box. To be fair, open-source models are generic by design and do not claim to understand industry-specific terminology. To Customize
standard open-source models with medical jargon, enterprises need to evaluate their
- Capabilities: Pharmaceutical names or diseases change and evolve. For example, Nuance, the most known vendor with 20+ years of experience in healthcare, published a content package for COVID-19. Nobody can foresee the future. Hence, enterprises need in-house machine learning expertise to keep the models up-to-date.
- Resources: Training, running and maintaining large speech models require significant computing power. Most enterprises don’t have access to large server farms. Hence, they should consider whether they can acquire the needed resources physically or virtually.
Open AI’s Whisper excited many developers interested in medical transcription as it promises HIPAA Compliance
by processing voice data on-device. However, Whisper’s parameter sizes range from 39 million to 1.6 billion. It means when enterprises need to add a new drug or disease name, just like the COVID-19 content, they need to re-train these large models.
Real-time
Humans are accustomed to having real-time responses in human-human interactions. There is no latency in our conversations.
- Real-time Transcription:
Medical Dictation
should use automated transcription models that can handle real-time transcription. Models process data in batches are not a good fit for dictation use cases. - Predictable and minimal response time: Latency and unpredictable response time are inherent limitations of cloud computing. Using on-device speech models with old technology also has performance problems.
All reputable automated transcription model vendors offer streaming transcription models. Some open-source models, such as Whisper, do not. There are ways to pass small audio snippets to make them work. However, it’s still an Asynchronous
process with delay. Yet, using a streaming model is not enough. For example, Amazon Medical Transcribe runs on Amazon’s servers, making its response time unpredictable since enterprises do not have visibility or control over Amazon’s or any other 3rd party’s servers.
The technology used for training speech models affects Performance
, hence the Response Time
. For example, Picovoice’s automated transcription solutions run across platforms, including web browsers. However, Nuance’s mobile solution cannot even process voice data locally on the mobile device because it’s heavy to run on a mobile device.
Are you interested in a HIPAA-compliant
and accurate Medical Dictation
solution powered by state-of-the-art on-device voice recognition technology? Customize Cheetah Streaming Speech-to-Text on Picovoice’s self-service Console for free! If you have a large volume of data, talk to enterprise sales!