Picovoice Enables Voice Picking

  • Wake Word Detection
  • Speech-to-Intent
  • Voice User Interfaces
  • VUI
  • Voice Picking
September 29, 2021
Blog Thumbnail

What's voice picking?

Voice picking, also known as voice-directed picking, speech-based picking, pick by voice, or pick-to-voice, uses speech recognition and natural language understanding to guide workers to locate, pick, and place items. Voice picking systems increase productivity and reduce error by freeing the hands and eyes of the users.

Since the 90s, voice picking solutions have been used mainly by large distribution centers and warehouses. Voice-directed picking increases productivity by up to 35%, decreases error by up to 90%, and shortens the training time from days to minutes [1] [2]. As a result, organizations increase their revenue by providing better service to the customers, reduce labour costs per order, and minimize the cost of shipping a wrong item, resulting in happier customers, employees, and shareholders.

Benefits of voice picking

  • Reduced training time: Workers can start after little training, as they’re guided with simple verbal instructions, and are not required to learn complex workflows or memorize checklists. Therefore, warehouses with seasonal workers and high employee turnover save significant time.

  • Minimized use of tools and distractions: As voice prompts direct the warehouse operators, they do not need to pick up, read, and put down instructions at work. They focus on main tasks: visually locate correct items, pick the right quantity, and place them. Tasks such as data entry or following written instructions result in picking the wrong item or quantity and wasting time between bins. Simplifying the process and limiting distractions increase speed and accuracy.

  • Improved workplace safety: Leaving pickers’ hands-free, especially while dealing with heavy objects, or using sharp items improves workplace safety significantly.

Unsurprisingly, large distribution centers and warehouse management system providers have joined the voice trend. The voice-directed warehousing solution market is expected to grow from $1.4 billion in 2020 to $4.8 billion in 2031. In this decade we’ll see many more enterprises adopt voice-directed systems [3]. If you consider joining these enterprises, there are three major things that we want you to know: Connectivity, Ease of Use, and Cost.

Things to know


It’s believed that one needs superior WIFI coverage in the warehouse, especially with different storage areas. In most cases, that’s true. Remembering that you wait for Siri or Alexa to respond even when you ask the time of the day, you do not want poor latency or poor responsiveness to cause delays in your operations.

With a voice solution working on the cloud, when a picker asks where to go next, the data will be transmitted to the cloud, processed on the cloud, and transmitted back. Any connectivity issues in the warehouse, with the internet, or cloud service provider will hinder the picker’s productivity.

Picovoice offers a consistent and guaranteed real-time experience by running fully on-device without any network dependency. With Picovoice technology, voice commands can be processed offline, enabling your picker to work with no disruption.

Ease of Use

Demographics of the warehouse workers vary: Different age groups, educational backgrounds, dialects and accents... Finding a solution that works across different accents and in noisy warehouse environments with high accuracy and that requires no or limited technology literacy is key. You need a simple voice user interface that understands your workers perfectly and offers flexibility when your needs change.

Traditional solutions typically process the voice in two steps: 1) Capture the voice and transcribe utterances (speech-to-text) 2) process the text to create connections to a certain intent (Natural Language Understanding). Moreover, they require you to train the algorithm with your voice files to improve accuracy. The main problem with this approach is that it is time-consuming. It could be even more time-consuming when new products and names are added frequently.

Picovoice, with its Rhino™ Speech-to-Intent engine, directly interprets voice into intent, bypassing the need for an intermediate text generation, minimizes errors in the process, and improves accuracy significantly. By fusing voice recognition and natural language modules, Picovoice enables organizations to create highly optimized models that outperform alternatives with high margins.


Every IT decision-maker knows that the cost of a product is different from the total cost of ownership. Two things may skyrocket your total cost of ownership significantly: 1) Development Cost 2) Cloud Bill

  1. Development Cost: Launching your first running prototype, based on a technology that requires data collection, takes approximately 6 months. If you happen to change some requirements after the prototype to have a better solution to address the user needs, the process starts again. With Picovoice, you can reduce your time-to-market from months to hours for your first prototype. Picovoice Console enables you to develop a web-based prototype and deploy it to the platform of your choice instantly, whether you want your solution to work on an embedded device, Android, iOS or Windows; or your developers prefer Flutter, React or Native SDKs. Plus, you can integrate them with your existing systems such as ERP to automate your process.
  2. Cloud Bill: Cloud providers charge based on usage: number of API calls, minutes of processed voice data. While developing a prototype it might not be a problem. However, when workers start picking millions of items, the bill will go up. Picovoice offers flexibility to deploy and run voice user interfaces offline (on-device), on-premise or cloud, depending on your need and requirements.

Major cloud providers such as Google charge per minute usage. Assuming that you have a warehouse with 1000 employees working for a 7-hour shift per day, your monthly cost will be close to $750,000 [4]. With edge computing, the cost could go down by 10x.

Ready to start building?

Picovoice offers an end-to-end platform to build voice interfaces with the flexibility to customize for your needs and to integrate with your existing systems, such as ERP or WMS. Our technology does not require any specific hardware, and it can also work offline. If you’re ready, we’ve made a simple demo to follow you through the steps.

Arrow (pointing at microphone button)
Press the microphone button to activate the demo.