You read Snow Crash by Neal Stephenson, watched Free Guy, heard Facebook, Microsoft or NVIDIA's plans for the Metaverse, or Gucci and Roblox collaboration, now you probably do not want to see another post on metaverse. Promise, this is different. If you’re not there yet, let’s catch you up quickly. The Metaverse is essentially connected communities where people can interact with each other to work, socialize, shop or play. It consists of extended reality, always-on digital environments powered by AR, VR and a mix of them. In this article, we’ll discuss the benefits of on-device voice recognition for AR, VR and MR applications so you can start building with web, Android, iOS or Unity SDKs.

The Metaverse may seem far away when you think of waiting for Alexa to play the next song for almost a minute. When you rely on cloud providers for voice recognition, the shortcomings of broadband connectivity and cloud latency hinder the experience. However, we have a solution, at least for the voice part. Let’s say you have an AR-enabled application where users can wear digital clothes and a user wants to change the colour.

Standard Approach

The user presses a push-to-talk button before talking. Then the voice data is recorded and sent to the cloud for transcription, how fast the data is sent depends on the user’s internet service provider (ISP). Next, the text is passed to an NLU (Natural Language Understanding) service to infer the user’s intent from the text. The latency of these steps depends on the cloud service providers’ performance and proximity to the data center. Then the command is sent back to the device for the task and once again this part relies on the ISP. Finally, the user can see the color has changed.

It may sound exhausting. Next time, when getting Alexa to play the next song takes some time, just think about this long process.

Standard Approach

Picovoice Approach

Picovoice’s wake word engine Porcupine eliminates the need for a push-to-talk button. The user can start with a voice command. Rhino infers the intent directly from speech without a text representation. During the whole process, voice data is processed locally on the device without sending the data to the cloud. With the Picovoice approach, since the voice data is not transmitted to the cloud, the users do not face reliability and latency issues.

Picovoice Approach

The user says “Porcupine (or your branded wake word) change the color to black”, and voila!

What’s more with Picovoice?

Picovoice technology is not only fast but also very mindful when it comes to power consumption. For example, Porcupine, the wake word engine, uses less than 4% of the Raspberry Pi 3 CPUs and detects multiple wake words concurrently without any additional footprint. You can develop voice products for Metaverse by using Picovoice SDKs for web, Android, iOS or Unity.

porcupine = pvporcupine.create(
access_key=access_key,
keyword_paths=keyword_paths)
while True:
keyword_index = porcupine.process(audio_frame())
if keyword_index >= 0:
// Detection callback
Build with Python
let porcupine = new Porcupine(
accessKey,
keywordPaths,
sensitivities);
while (true) {
let keywordIndex = porcupine.process(audioFrame());
if (keywordIndex >= 0) {
// Detection callback
}
}
Build with NodeJS
PorcupineManager porcupineManager = new PorcupineManager.Builder()
.setAccessKey(accessKey)
.setKeywordPath(keywordPath)
.build(
appContext,
new PorcupineManagerCallback() {
@Override
public void invoke(int keywordIndex) {
// Detection callback
}
}
);
porcupineManager.start()
Build with Android
let porcupineManager = try PorcupineManager(
accessKey,
keywordPath: keywordPath,
onDetection: { keywordIndex in
// Detection callback
})
try porcupineManager.start()
Build with iOS
const {
keywordDetection,
isLoaded,
isListening,
error,
init,
start,
stop,
release,
} = usePorcupine();
init(
accessKey,
keywords,
model
);
useEffect(() => {
if (keywordDetection !== null) {
// Keyword detection
}
}, [keywordDetection])
Build with React
PorcupineManager porcupineManager = await PorcupineManager.fromKeywordPaths(
accessKey,
keywordPaths,
(keywordIndex) => {
// Detection callback
});
await porcupineManager.start()
Build with Flutter
let porcupineManager = await PorcupineManager.fromKeywordPaths(
accessKey,
keywordPaths,
(keywordIndex) => {
// Detection callback
});
await porcupineManager.start()
Build with React Native
PorcupineManager porcupineManager = PorcupineManager.FromKeywordPaths(
accessKey,
keywordPaths,
(keywordIndex) => {
// Detection callback
});
porcupineManager.start();
Build with Unity
constructor(private porcupineService: PorcupineService) {
this.keywordSubscription = porcupineService.keywordDetection$.subscribe(
porcupineDetection => {
// Detection callback
}
)
}
async ngOnInit() {
await this.porcupineService.init(
accessKey,
keywords,
model
)
}
Build with Angular
{
data() {
const {
state,
init,
start,
stop,
release
} = usePorcupine();
init(accessKey, keywords, model);
return { state, start, stop, release }
},
watch: {
"state.keywordDetection": function (keyword) {
if (keyword !== null) {
// Detection callback
}
}
}
}
Build with Vue
Porcupine porcupine = Porcupine.FromKeywordPaths(
accessKey,
keywordPaths);
while (true)
{
var keywordIndex = porcupine.Process(AudioFrame());
if (keywordIndex >= 0)
{
// Detection callback
}
}
Build with .NET
Porcupine porcupine = new Porcupine.Builder()
.setAccessKey(accessKey)
.setKeywordPath(keywordPath)
.build();
while (true) {
int keywordIndex = porcupine.process(audioFrame());
if (keywordIndex >= 0) {
// Detection callback
}
}
Build with Java
porcupine := Porcupine{
AccessKey: accessKey,
KeywordPaths: keywordPaths}
porcupine.Init()
for {
keywordIndex, err := porcupine.Process(AudioFrame())
if keywordIndex >= 0 {
// Detection callback
}
}
Build with Go
let porcupine: Porcupine = PorcupineBuilder
::new_with_keyword_paths(keyword_paths)
.init()
.expect("");
loop {
if let Ok(keyword_index) = porcupine.process(&audio_frame()) {
if keyword_index >= 0 {
// Detection callback
}
}
}
Build with Rust
pv_porcupine_t *porcupine = NULL;
pv_porcupine_init(
access_key,
model_path,
num_keywords,
keyword_paths,
&sensitivities,
&porcupine);
while (true) {
pv_porcupine_process(
porcupine,
audio_frame(),
&keyword_index);
if (keyword_index >= 0) {
// Detection callback
}
}
Build with C

Watch the demo!

To show how it works, we’ve built a Voice-Controlled VR Video Player with Unity.

Start Building!

Adding on-device voice recognition to AR, VR or MR applications is just a few clicks away. Explore Metaverse with Picovoice’s Free Tier and build your prototype in minutes.

AR, VR and MR stand for Augmented Reality, Virtual Reality and Mixed Reality, respectively. Extended Reality can be shortened as XR and is an umbrella term for AR, VR and MR.