Wake Word Detection
in the Cloud
is not a good idea. It’s expensive, not reliable, and not private. Wake Word Detection
engines are always-on and always listening. They listen to the conversations in the environment to detect the wake phrase to activate the desired software. The only way to process voice data in the Cloud
is to send audio recordings. Running a Wake Word Detection
engine in the Cloud
means recording, transmitting and processing voice data 24/7. It comes at a cost.
Direct Costs
Transcription in the cloud costs $12,600 per year.
The yearly cost of continuously running Google Speech-to-Text in the Cloud
is $12.6K. Paying $12K annually to process voice data on a $30 Amazon Echo device is not feasible. Even though Big Tech owns the infrastructure, i.e. cost is lower for them, it’s still not viable. That’s why Wake Word Detection
engines of Amazon, Google and Apple detect Alexa, OK Google and Hey Siri on the device.
Sending data to the cloud costs $16,400 per year.
The connectivity cost depends on the audio file type and how compressed it is. To be more technical, it depends on the sample and bit rates. However, for the sake of simplicity, let’s use Stanford University’s rule of thumb, 1 MB per minute. So, the amount of data transferred to the Cloud
is 5250 GB. The connectivity cost varies across the globe and is $3.12 on average. Thus, it costs $16.4K to send data to the Cloud
on a mobile plan.
Frequent battery replacements add $200 per year.
Some Redditors are unhappy with the battery consumption of Siri and Google Assistant. However, neither Apple nor Google officially shares how much battery their voice assistants consume. Again the battery consumption varies and depends on the signal strength and the technology. Let’s assume LTE consumes 1400 mWatt. The battery capacity of the iPhone 14 is 12.68 watt-hours. An iPhone can retain up to 80% of its original capacity at 500 complete charges, then requires a new battery. It costs $99 to replace it.
LTE consuming 12K watt-hours per year requires 967 complete charges, hence almost 2 iPhone 14 batteries. Running a Wake Word Detection
engine in the Cloud
adds an annual $192 battery replacement cost. It becomes the equivalent of a new device cost over four years.
The $30K/year does not include the environmental costs. Energy resources are scarce. Electricity prices in Europe are skyrocketing. There are hundreds of millions of Alexa devices out there. Even if Amazon were to bear the transcription costs and run its Wake Word Detection
engine in the Cloud
, the impact on the environment and economy would be disastrous.
Indirect Costs
Processing data in the cloud is not reliable.
Remember when you asked Alexa to play the next song and took a minute? It’s because Alexa records you saying “play the next song” and sends it to the Cloud
for transcription, then waits for feedback. While sending the recording “play the next song.”, your internet connection can be poor, or Amazon’s servers may have a problem, so Alexa has to wait. Imagine that happens even when you say “Alexa.” Reliance on internet connectivity results in poor user experience, and that inconvenience may lead to high churn rates.
Sending data to the cloud means no privacy.
Enterprises should comply with regulations, e.g. GDPR and CCPA while handling voice recordings. However, Cloud
computing is known for data breaches. A recent IDC survey shows that 98% of enterprises experienced at least one Cloud
data breach in the past 18 months. Sending voice data to the Cloud
is risky. Sending voice data to the Cloud
on 7/24 is like staying at the One Dollar Hotel or participating in Big Brother, the reality show.
Some argue Cloud
models have higher accuracy. The Cloud
doesn't necessarily mean higher accuracy. “The larger the model, the higher the accuracy” is no longer a valid argument in the 2020s. Thus, rely on data, not claims. Picovoice publishes an open-source wake word benchmark to help with the evaluation process. You can train a custom wake word on the Picovoice Console in seconds and get Porcupine up and running in minutes to compare it with alternatives. If you aren’t sure how to choose a wake word, start with some tips.