In this article, we will learn how to perform Speech Recognition in Python.

Speech Recognition and Speech-to-Text are often used interchangeably. But Speech-to-Text is only a subfield of Speech Recognition. Other forms of Speech Recognition include Wake Word Detection, Voice Command Recognition, and Voice Activity Detection (VAD).

Below is the cheat sheet I use when deciding which Speech Recognition algorithm to use:

  • Do you need to detect if a person is talking and when? Then use Cobra Voice Activity Detection.
  • Do you need to detect the occurrence of a single phrase? Or one of a few phrases? Then Porcupine Wake Word is the right engine.
  • Do you need to understand voice commands? Rhino Speech-to-Intent is the correct tool here. Rhino can infer users' intent and accurately extract the request's details (i.e., slot values) using minimum runtime resources.
  • Do you need to transcribe speech to text in real time? Use Cheetah Streaming Speech-to-Text.
  • Do you need to transcribe large volumes of speech to text in batch mode? Leopard Speech-to-Text is the right tool.

The SDKs in this tutorial can run on Linux, macOS, Windows, Raspberry Pi, NVIDIA Jetson, and BeagleBone.

Cobra Voice Activity Detection

1- Install the Cobra Voice Activity Detection SDK using PIP:

2- Sign up for a free Picovoice Console account and copy your AccessKey. It handles authentication and authorization.

3- Create an instance of the Voice Activity Detection engine:

4- Pass in frames of audio to the .process method:

For more information check Cobra Voice Activity Detection's product page or refer to Cobra's Python SDK quick start guide.

Porcupine Wake Word

1- Install the Porcupine Wake Word SDK using PIP:

2- Sign up for a free Picovoice Console account and copy your AccessKey. It handles authentication and authorization.

3- Create your custom wake word model using Picovoice Console.

4- Create an instance of the Wake Word engine:

5- Pass in frames of audio to the .process method:

For more information check Porcupine Wake Words's product page or refer to Porcupine's Python SDK quick start guide.

Rhino Speech-to-Intent

1- Install the Rhino Speech-to-Intent SDK using PIP:

2- Sign up for a free Picovoice Console account and copy your AccessKey. It handles authentication and authorization.

3- Create your Context using Picovoice Console.

4- Create an instance of Rhino Speech-to-Intent to start recognizing voice commands within the domain of the provided context:

5- Pass in frames of audio to the .process function and use the .get_inference function to determine the user's intent:

For more information check Rhino Speech-to-Intent's product page or refer to Rhino's Python SDK quick start guide.

Cheetah Streaming Speech-to-Text

1- Install the Cheetah Streaming Speech-to-Text SDK using PIP:

2- Sign up for a free Picovoice Console account and copy your AccessKey. It handles authentication and authorization.

3- Create an instance of Cheetah to transcribe speech to text in real-time:

4- Pass in audio frames as they become available to the .process function:

For more information check Cheetah Streaming Speech-to-Text's product page or refer to Cheetah's Python SDK quick start guide.

Leopard Speech-to-Text

1- Install the Leopard Speech-to-Text SDK using PIP:

2- Sign up for a free Picovoice Console account and copy your AccessKey. It handles authentication and authorization.

3- Create an instance of Leopard to transcribe speech to text:

4- Pass in an audio file to Leopard and inspect the result:

For more information, check Leopard Speech-to-Text's product page or refer to Leopard's Python SDK quick start guide.