The pandemic has overwhelmed the healthcare system.
Even before the pandemic, physician burnout was a problem,
and administrative tasks, i.e. paperwork, have been the primary
cause. Medical Dictation
, addressing this problem, is one of the widely adopted speech solutions. Yet, recent advances
make choosing and building Medical Dictation
software challenging. Should you go with Nuance and take no risk or other
Big Tech companies? How about innovative startups or open-source?
To help you answer this question, let’s look at what a Medical Dictation
solution requires: HIPAA Compliance
,
High Accuracy
and Fast Response Time
.
HIPAA Compliance
First and foremost, Medical Dictation
should be HIPAA-compliant
. Enterprises using the cloud speech APIs need to
ensure
- Transmission & Infrastructure Security: The voice data should be encrypted, transferred securely and stored on secure servers.
- Confidentiality: Anyone handling or accessing medical data should know how to work with PHI (Protected Health Information) and PII ( Personally identifiable information)
- Geo-Location And Geo-Fencing: Regulations may require voice data to be stored and processed in a specific geographic location.
There are several privacy-related questions enterprises should
ask the automated transcription API providers. Some cloud providers offer HIPAA Compliance
. For example, Google
requires clients to talk to their account managers to execute
a Business Associate Agreement (BAA)
and not share their data with Google for training purposes. Not sharing the data with
Google costs enterprises 50% more than using Google Speech-to-Text
API and letting Google use their data. Google leaves the responsibility of building and executing HIPAA-compliant
solutions to the clients. AWS also offers HIPAA-eligible
transcription API, Amazon Transcribe Medical. AWS
charges three times more for Amazon Transcribe Medical than the standard
Amazon Transcribe models.
On the other hand, on-device automated transcription solutions address privacy and compliance needs and concerns by design. Voice data doesn’t leave the device. Thus, it’s not transmitted to a 3rd party server or stored. Nobody accesses data other than the owner.
Custom Models for High Accuracy
After addressing compliance needs, on-device automated transcription solutions and selected cloud transcription APIs
remain as options for Medical Dictation
. Open-source speech recognition models are not highly Accurate
for
industry-specific jargon out of the box. To be fair, open-source models are generic by design and do not claim to
understand industry-specific terminology. To Customize
standard open-source models with medical jargon, enterprises
need to evaluate their
- Capabilities: Pharmaceutical names or diseases change and evolve. For example, Nuance, the most known vendor with 20+ years of experience in healthcare, published a content package for COVID-19. Nobody can foresee the future. Hence, enterprises need in-house machine learning expertise to keep the models up-to-date.
- Resources: Training, running and maintaining large speech models require significant computing power. Most enterprises don’t have access to large server farms. Hence, they should consider whether they can acquire the needed resources physically or virtually.
Open AI’s Whisper excited many developers interested in medical transcription as it promises HIPAA Compliance
by
processing voice data on-device. However, Whisper’s parameter sizes range from 39 million to 1.6 billion. It means when
enterprises need to add a new drug or disease name, just like the COVID-19 content, they need to re-train these large
models.
Real-time
Humans are accustomed to having real-time responses in human-human interactions. There is no latency in our conversations.
- Real-time Transcription:
Medical Dictation
should use automated transcription models that can handle real-time transcription. Models process data in batches are not a good fit for dictation use cases. - Predictable and minimal response time: Latency and unpredictable response time are inherent limitations of cloud computing. Using on-device speech models with old technology also has performance problems.
All reputable automated transcription model vendors offer streaming transcription models. Some open-source models, such
as Whisper, do not. There are ways to pass small audio snippets to make
them work. However, it’s still an Asynchronous
process with delay. Yet, using a streaming model is not enough. For
example, Amazon Medical Transcribe runs on Amazon’s servers, making its response time unpredictable since enterprises do
not have visibility or control over Amazon’s or any other 3rd party’s servers.
The technology used for training speech models affects Performance
, hence the Response Time
. For example,
Picovoice’s automated transcription solutions run across platforms, including web browsers.
However, Nuance’s mobile solution
cannot even process voice data locally on the mobile device because it’s heavy to run on a mobile device.
Are you interested in a HIPAA-compliant
and accurate Medical Dictation
solution powered by state-of-the-art
on-device voice recognition technology? Customize Cheetah Streaming
Speech-to-Text on Picovoice’s self-service Console for free! If you have a large volume
of data, talk to enterprise sales!