Choosing Voice Commands
This document is meant for product owners who are choosing voice commands for controlling a voice-enabled product.
Voice commands are a set of phrases that enable controlling a device via voice. They almost always are used in conjunction with a wake word that activates the product. The figure below depicts this arrangement. For example, a smart lamp can be activated using a wake word such as "OK Lamp". Furthermore, the user can change its hue using a set of color names such as orange, purple, yellow, etc.
Voice command interfaces can be implemented using either an on-device wake word engine, a local speech to text engine, or a cloud-based speech to text engine. Below we discuss when to use each solution. Finally, we present a couple of guidelines for choosing a set of commands that result in an accurate voice interface.
When to use Wake Word Engine?
A wake word engine is more lightweight compared to a speech to text engine and consumes fewer runtime resources (i.e. CPU, RAM, and storage). That being said it comes with limitations as the product can detect only a set of predefined commands. For complex tasks for which a free-form conversation is preferred, a speech to text engine might be the suitable solution. We suggest using a wake word engine when possible.
The following are guidelines for selecting a set of voice commands that result in an accurate interface.
1 - Avoid Similar Sounding Commands
Similar sounding commands are hard to distinguish by both machines and humans. For example "small" and "smaller". If there are two or more similar sounding commands it is recommended to replace them with synonyms with a more distinguished pronunciation. For example, instead of "small" and "smaller" it is possible to use "small" and "tiny". Otherwise, the detection threshold for similar sounding commands needs to be reduced. Picovoice's wake word engine allows setting detection sensitivity per command for exactly this reason.
2 - Avoid Long Commands
Using command phrases with more than four words is not recommended when using voice control engines as they are not optimized to operate on such long phrases.