🤸 Custom Speech-to-Text
Fine-tune speech-to-text models in seconds.
Start Free

Speech-to-text (STT), or Automatic Speech Recognition (ASR), converts spoken words into written text, automating transcription to save time and enhance accessibility across different applications. Its integration into applications can improve user experience and engagement.

Picovoice's Leopard Speech-to-Text and Cheetah Streaming Speech-to-Text are powerful STT engines that can be tailored to recognize custom vocabulary and boost the probability of specific words being detected. In this article, we will walk you through the process of creating a custom model for Leopard and Cheetah.

An equivalent video tutorial of this article is also available.

1. Picovoice Console

Sign up for a free Picovoice Console account. Once you've created an account, navigate to the Leopard & Cheetah page.

Click Leopard & Cheetah in the navigation bar

2. Create Your Custom Model

Once you're on the Leopard & Cheetah page, follow these steps to create a model:

  1. Click on the "New Model" button.
  2. Give your model a name.
  3. Select the language for your model.
  4. Click on the "Create Model" button.
Create Model

3. Add Custom Vocabulary

After creating a model, you will be automatically redirected to an editor where you can start customizing your model. Let's start by adding custom vocabulary.

  1. To add custom vocabulary, simply type in the words you want to add in the Custom Vocabulary tab and hit "Enter."
  2. Test your custom vocabulary by clicking on the microphone in the panel on the right hand side, and speaking the added words. Watch for the word in the Transcription panel below.
Custom Vocabulary

You can also test your model using audio files by clicking on the File tab. This can help maintain consistency between tests.

Each word you add will come with default pronunciations. While it's unlikely that these will require editing, you can click the arrow next to your custom word to toggle a dropdown that displays its pronunciations in IPA format. This allows you to add, remove, or make any necessary adjustments.

Custom Vocabulary

4. Increase Detection with Boost Words

Boost words are used to increase the likelihood of certain words being detected. To add a boost word, simply type it in to the Boost Words tab and hit "Enter." Boost words are useful to help Leopard or Cheetah make more informed distinctions between homophones based on your application context.

Boost Words

Note that any existing word can be boosted. For example, if "Thank you" is expected to be a frequently occurring phrase in your use case, go ahead and add it to your list of boost words.

5. Download your Custom Model

Once you are satisfied with your custom model, click on the blue download icon in the toolbar. By default, the model downloaded will be for Leopard. If you need real-time speech-to-text, select Cheetah instead. Click "Download", and in just a few seconds, you will find a folder in your downloads containing the .pv file for your custom model.

Next Steps

Now that you have your .pv file, you're ready to begin coding! Below is the list of SDKs supported by Leopard Speech-to-Text and Cheetah Streaming Speech-to-Text, along with corresponding code snippets and quick-start guides.

Leopard Quick Start

o = pvleopard.create(access_key)
transcript, words =
o.process_file(path)

Cheetah Quick Start

o = pvcheetah.create(access_key)
partial_transcript, is_endpoint =
o.process(get_next_audio_frame())