Speech-to-intent is a technology that converts spoken language into actionable commands or intents. It allows machines and applications to understand and respond to voice commands, making it a fundamental component of voice-controlled devices and applications.

Picovoice's Rhino Speech-to-Intent engine is a speech-to-intent engine that can be tailored to infer intent from spoken commands within a given context. Being context-specific allows for better performance and accuracy. In this article, we will walk you through the process of creating a custom context for Rhino.

An equivalent video tutorial of this article is also available.

1. Picovoice Console

Sign up for a free Picovoice Console account. Once you've created an account, navigate to the Rhino page.

Click Rhino in the navigation bar

2. Create Your Custom Context

Once you're on the Rhino page, follow these steps to create a context:

  1. Click on the "New Context" button.
  2. Give your context a name.
  3. Select the language for your context.
  4. You can optionally select a template, which will fill your context with some boilerplate. We're going to leave this as "Empty" for this article.
  5. Click on the "Create Context" button.
Create Context

3. Define an Intent

After creating a context, you will be automatically redirected to an editor where you can start customizing your context. Let's start by defining an intent.

  1. Click on the + icon beside Intents to add a new intent. Give it a name, like "orderBeverage", and hit "Enter".
  2. Inside your new intent, add an expression. For example, "Make me a small coffee".

Expressions are phrases that Rhino will listen for, such as "Make me a small coffee". An intent is just a container for a group of related expressions. For example, "Make me a small coffee" and "I'd like a large coffee" might belong to the same intent.

Create Intent
  1. Test your expression by clicking on the microphone in the panel on the right hand side, and saying "Make me a small coffee". Watch for the result in the Inference panel below.

After testing, you'll notice that Rhino recognizes the intent but lacks key details like the size and type of beverage. This is where slots come into play.

4. Add a Slot

A slot can be thought of as a variable that can hold different possible values that Rhino can detect. Let's create one for the size of the beverage.

  1. Click on the + icon beside Slots to add a new slot. Give it a name, like "size".
  2. Inside your new slot, add all possible values for the slot, like "small", "medium", "large", and "extra large."
  3. Now, replace the word "small" in your expression with the "size" slot. Do this by typing in a "$" sign, and selecting the "size" option. Add a unique identifier after $size like :size (having identifiers is required, and allows you to re-use a slot multiple times within the same expression).
  4. Test your expression again, and you'll see that the "size" information is now available.
Add Slot

5. Add a Macro

What if someone starts their order with something other than "Make me"? For such variations, we can use macros. Macros are used to represent a list of possible phrases that we want to detect but don't care about the specific value. In our case, we don't need to know if someone said "Make me" versus "Get me" – we just want to know they ordered a coffee.

  1. Create a macro and give it a name, like "brew"
  2. Add phrases like "Get me", "Make me", "Could I have", and "I would like".
  3. Replace "Make me" in your expression with the "brew" macro. Do this by typing in an "@" sign, and selecting your "brew" macro.
Add Macro

The Inference won't show the macro in the output since Rhino only detects that one of the options was said. If you need to know which option was said, use Slots instead.

6. Logical "OR"s & Optionals

Use [square brackets] to indicate a logical "OR" between phrases. One of the phrases wrapped in square brackets can be spoken to match the expression.

Use (round brackets) to indicate optional phrases, which are phrases that may be omitted without changing the intent.

Choices & Options

The expression above would match any of the following phrases:

  • "I would like a large coffee"
  • "Could I have a medium coffee please"
  • "Make me an extra large cup of coffee"

But would not match these phrases (since neither "a" nor "an" are present):

  • "I would like large coffee"
  • "Could I have medium cup of coffee, thanks"

7. Download your Custom Context

Once you are satisfied with your custom context, click on the blue download icon in the toolbar. Select the platform that matches the intended runtime platform, and click "Download". In just a few seconds, you will find a folder in your downloads containing the .rhn file for your custom context.

Next Steps

Now that you have your .rhn file, you're ready to begin coding! Below is the list of SDKs supported by Rhino Speech-to-Intent, along with corresponding code snippets and quick-start guides.

o = pvrhino.create(
access_key,
context_path)
while not o.process(audio()):
pass
inference = o.get_inference()