Speech-to-text (STT), or Automatic Speech Recognition (ASR), converts spoken words into written text, automating transcription to save time and enhance accessibility across different applications. Its integration into applications can improve user experience and engagement.
Picovoice's Leopard Speech-to-Text and Cheetah Streaming Speech-to-Text are powerful STT engines that can be tailored to recognize custom vocabulary and boost the probability of specific words being detected. In this article, we will walk you through the process of creating a custom model for
An equivalent video tutorial of this article is also available.
1. Picovoice Console
Sign up for a free Picovoice Console account. Once you've created an account, navigate to the
Leopard & Cheetah page.
2. Create Your Custom Model
Once you're on the
Leopard & Cheetah page, follow these steps to create a model:
- Click on the "New Model" button.
- Give your model a name.
- Select the language for your model.
- Click on the "Create Model" button.
3. Add Custom Vocabulary
After creating a model, you will be automatically redirected to an editor where you can start customizing your model. Let's start by adding custom vocabulary.
- To add custom vocabulary, simply type in the words you want to add in the
Custom Vocabularytab and hit "Enter."
- Test your custom vocabulary by clicking on the microphone in the panel on the right hand side, and speaking the added words. Watch for the word in the
You can also test your model using audio files by clicking on the
File tab. This can help maintain consistency between tests.
Each word you add will come with default pronunciations. While it's unlikely that these will require editing, you can click the arrow next to your custom word to toggle a dropdown that displays its pronunciations in IPA format. This allows you to add, remove, or make any necessary adjustments.
4. Increase Detection with Boost Words
Boost words are used to increase the likelihood of certain words being detected. To add a boost word, simply type it in to the
Boost Words tab and hit "Enter." Boost words are useful to help
Cheetah make more informed distinctions between homophones based on your application context.
Note that any existing word can be boosted. For example, if "Thank you" is expected to be a frequently occurring phrase in your use case, go ahead and add it to your list of boost words.
5. Download your Custom Model
Once you are satisfied with your custom model, click on the blue download icon in the toolbar. By default, the model downloaded will be for
Leopard. If you need real-time speech-to-text, select
Cheetah instead. Click "Download", and in just a few seconds, you will find a folder in your downloads containing the
.pv file for your custom model.
Now that you have your
.pv file, you're ready to begin coding! Below is the list of SDKs supported by
Leopard Speech-to-Text and
Cheetah Streaming Speech-to-Text, along with corresponding code snippets and quick-start guides.
Leopard Quick Start
o = pvleopard.create(access_key)transcript, words =o.process_file(path)
Cheetah Quick Start
o = pvcheetah.create(access_key)partial_transcript, is_endpoint =o.process(get_next_audio_frame())