🚀 Best-in-class Voice AI!
Build compliant and low-latency AI apps using Python without sending user data to 3rd party servers.
Start Free

In this article, we will learn how to perform Speech Recognition in Python.

Speech Recognition and Speech-to-Text are often used interchangeably. But Speech-to-Text is only a subfield of Speech Recognition. Other forms of Speech Recognition include Wake Word Detection, Voice Command Recognition, and Voice Activity Detection (VAD).

Below is the cheat sheet I use when deciding which Speech Recognition algorithm to use:

  • Do you need to detect if a person is talking and when? Then use Cobra Voice Activity Detection.
  • Do you need to detect the occurrence of a single phrase? Or one of a few phrases? Then Porcupine Wake Word is the right engine.
  • Do you need to understand voice commands? Rhino Speech-to-Intent is the correct tool here. Rhino can infer users' intent and accurately extract the request's details (i.e., slot values) using minimum runtime resources.
  • Do you need to transcribe speech to text in real time? Use Cheetah Streaming Speech-to-Text.
  • Do you need to transcribe large volumes of speech to text in batch mode? Leopard Speech-to-Text is the right tool.

The SDKs in this tutorial can run on Linux, macOS, Windows, Raspberry Pi, NVIDIA Jetson, and BeagleBone.

Cobra Voice Activity Detection

1- Install the Cobra Voice Activity Detection SDK using PIP:

2- Sign up for a free Picovoice Console account and copy your AccessKey. It handles authentication and authorization.

3- Create an instance of the Voice Activity Detection engine:

4- Pass in frames of audio to the .process method:

For more information check Cobra Voice Activity Detection's product page or refer to Cobra's Python SDK quick start guide.

Porcupine Wake Word

1- Install the Porcupine Wake Word SDK using PIP:

2- Sign up for a free Picovoice Console account and copy your AccessKey. It handles authentication and authorization.

3- Create your custom wake word model using Picovoice Console.

4- Create an instance of the Wake Word engine:

5- Pass in frames of audio to the .process method:

For more information check Porcupine Wake Words's product page or refer to Porcupine's Python SDK quick start guide.

Rhino Speech-to-Intent

1- Install the Rhino Speech-to-Intent SDK using PIP:

2- Sign up for a free Picovoice Console account and copy your AccessKey. It handles authentication and authorization.

3- Create your Context using Picovoice Console.

4- Create an instance of Rhino Speech-to-Intent to start recognizing voice commands within the domain of the provided context:

5- Pass in frames of audio to the .process function and use the .get_inference function to determine the user's intent:

For more information check Rhino Speech-to-Intent's product page or refer to Rhino's Python SDK quick start guide.

Cheetah Streaming Speech-to-Text

1- Install the Cheetah Streaming Speech-to-Text SDK using PIP:

2- Sign up for a free Picovoice Console account and copy your AccessKey. It handles authentication and authorization.

3- Create an instance of Cheetah to transcribe speech to text in real-time:

4- Pass in audio frames as they become available to the .process function:

For more information check Cheetah Streaming Speech-to-Text's product page or refer to Cheetah's Python SDK quick start guide.

Leopard Speech-to-Text

1- Install the Leopard Speech-to-Text SDK using PIP:

2- Sign up for a free Picovoice Console account and copy your AccessKey. It handles authentication and authorization.

3- Create an instance of Leopard to transcribe speech to text:

4- Pass in an audio file to Leopard and inspect the result:

For more information, check Leopard Speech-to-Text's product page or refer to Leopard's Python SDK quick start guide.