Speech-to-intent is a technology that converts spoken language into actionable commands or intents. It allows machines and applications to understand and respond to voice commands, making it a fundamental component of voice-controlled devices and applications.
Picovoice's Rhino Speech-to-Intent engine is a speech-to-intent engine that can be tailored to infer intent from spoken commands within a given context. Being context-specific allows for better performance and accuracy. In this article, we will walk you through the process of creating a custom context for Rhino
.
An equivalent video tutorial of this article is also available.
1. Picovoice Console
Sign up for a free Picovoice Console account. Once you've created an account, navigate to the Rhino
page.
2. Create Your Custom Context
Once you're on the Rhino
page, follow these steps to create a context:
- Click on the "New Context" button.
- Give your context a name.
- Select the language for your context.
- You can optionally select a template, which will fill your context with some boilerplate. We're going to leave this as "Empty" for this article.
- Click on the "Create Context" button.
3. Define an Intent
After creating a context, you will be automatically redirected to an editor where you can start customizing your context. Let's start by defining an intent.
- Click on the + icon beside
Intents
to add a new intent. Give it a name, like "orderBeverage", and hit "Enter". - Inside your new intent, add an expression. For example, "Make me a small coffee".
Expressions are phrases that Rhino
will listen for, such as "Make me a small coffee". An intent is just a container for a group of related expressions. For example, "Make me a small coffee" and "I'd like a large coffee" might belong to the same intent.
- Test your expression by clicking on the microphone in the panel on the right hand side, and saying "Make me a small coffee". Watch for the result in the
Inference
panel below.
After testing, you'll notice that Rhino
recognizes the intent but lacks key details like the size and type of beverage. This is where slots come into play.
4. Add a Slot
A slot can be thought of as a variable that can hold different possible values that Rhino
can detect. Let's create one for the size of the beverage.
- Click on the + icon beside
Slots
to add a new slot. Give it a name, like "size". - Inside your new slot, add all possible values for the slot, like "small", "medium", "large", and "extra large."
- Now, replace the word "small" in your expression with the "size" slot. Do this by typing in a "$" sign, and selecting the "size" option. Add a unique identifier after
$size
like:size
(having identifiers is required, and allows you to re-use a slot multiple times within the same expression). - Test your
expression
again, and you'll see that the "size" information is now available.
5. Add a Macro
What if someone starts their order with something other than "Make me"? For such variations, we can use macros. Macros are used to represent a list of possible phrases that we want to detect but don't care about the specific value. In our case, we don't need to know if someone said "Make me" versus "Get me" – we just want to know they ordered a coffee.
- Create a macro and give it a name, like "brew"
- Add phrases like "Get me", "Make me", "Could I have", and "I would like".
- Replace "Make me" in your expression with the "brew" macro. Do this by typing in an "@" sign, and selecting your "brew" macro.
The Inference
won't show the macro in the output since Rhino
only detects that one of the options was said. If you need to know which option was said, use Slots
instead.
6. Logical "OR"s & Optionals
Use [square brackets] to indicate a logical "OR" between phrases. One of the phrases wrapped in square brackets can be spoken to match the expression.
Use (round brackets) to indicate optional phrases, which are phrases that may be omitted without changing the intent.
The expression above would match any of the following phrases:
- "I would like a large coffee"
- "Could I have a medium coffee please"
- "Make me an extra large cup of coffee"
But would not match these phrases (since neither "a" nor "an" are present):
- "I would like large coffee"
- "Could I have medium cup of coffee, thanks"
7. Download your Custom Context
Once you are satisfied with your custom context, click on the blue download icon in the toolbar. Select the platform that matches the intended runtime platform, and click "Download". In just a few seconds, you will find a folder in your downloads containing the .rhn
file for your custom context.
Next Steps
Now that you have your .rhn
file, you're ready to begin coding! Below is the list of SDKs supported by Rhino Speech-to-Intent
, along with corresponding code snippets and quick-start guides.
o = pvrhino.create(access_key,context_path)while not o.process(audio()):passinference = o.get_inference()