Not Another Metaverse Post

  • AR
  • VR
  • Speech Recognition
  • Wake Word
  • Voice Commands
  • Unity
  • iOS
  • Android
  • Web
November 17, 2021

You read Snow Crash by Neal Stephenson, watched Free Guy, heard Facebook, Microsoft or NVIDIA's plans for metaverse, or Gucci and Roblox collaboration, now you probably do not want to see another post on metaverse. Promise, this is different. If you’re not there yet, let’s catch you up quickly. The metaverse is essentially connected communities where people can interact with each other to work, socialize, shop or play. It consists of always-on digital environments powered by AR, VR and a mix of them.

The metaverse may seem far away when you think of waiting for Alexa to play the next song for almost a minute. When you rely on cloud providers for voice recognition, the shortcomings of broadband connectivity and cloud latency hinder the experience. However, we have a solution, at least for the voice part. Let’s say you have an AR-enabled application where users can wear digital clothes and a user wants to change the color.

Standard Approach

The user presses a push-to-talk button before talking. Then the voice data is recorded and sent to the cloud for transcription, how fast the data is sent depends on the user’s internet service provider (ISP). Next, the text is passed to an NLU (Natural Language Understanding) service to infer the user’s intent from the text. The latency of these steps depends on the cloud service providers’ performance and proximity to the data center. Then the command is sent back to the device for the task and once again this part relies on the ISP. Finally, the user can see the color has changed.

Standard Approach

It may sound exhausting. Next time, when getting Alexa to play the next song takes some time, just think about this long process.

Picovoice Approach

Picovoice’s wake word engine Porcupine eliminates the need for a push-to-talk button. The user can start with a voice command. The voice data is processed on-device, the intent is inferred directly without a text representation. With the Picovoice approach, since the voice data is not transmitted to the cloud, the users do not face reliability and latency issues.

Picovoice Approach

The user says “Porcupine (or your branded wake word) change the color to black”, and voila!

What’s more with Picovoice?

Picovoice products are not only fast but also very mindful when it comes to power consumption. For example, Porcupine, the wake word engine, uses less than 4% of the Raspberry Pi 3 CPU and detects multiple wake words concurrently without any additional footprint. You can develop voice products by using Picovoice SDKs for web, Android, iOS or Unity. To show you how it works, we’ve built a Voice-Controlled VR Video Player with Unity.

Start Building!

If you’re interested in developing your own voice product, you’re just a few clicks away. Try Picovoice Console for free and build your prototype in minutes.

Learn more

If you’re building AR, VR or mixed reality solutions contact us to tell us about your project and get a 50% discount!