In this article, we will learn how to perform Speech Recognition
in Python.
Speech Recognition and Speech-to-Text
are often used interchangeably. But Speech-to-Text is only a subfield of Speech Recognition. Other forms of Speech Recognition include Wake Word Detection
, Voice Command Recognition
, and Voice Activity Detection
(VAD
).
Below is the cheat sheet I use when deciding which Speech Recognition algorithm to use:
- Do you need to detect if a person is talking and when? Then use Cobra Voice Activity Detection.
- Do you need to detect the occurrence of a single phrase? Or one of a few phrases? Then Porcupine Wake Word is the right engine.
- Do you need to understand voice commands? Rhino Speech-to-Intent is the correct tool here. Rhino can infer users' intent and accurately extract the request's details (i.e., slot values) using minimum runtime resources.
- Do you need to transcribe speech to text in real time? Use Cheetah Streaming Speech-to-Text.
- Do you need to transcribe large volumes of speech to text in batch mode? Leopard Speech-to-Text is the right tool.
The SDKs in this tutorial can run on Linux
, macOS
, Windows
, Raspberry Pi
, NVIDIA Jetson
, and BeagleBone
.
Cobra Voice Activity Detection
1- Install the Cobra Voice Activity Detection
SDK using PIP:
2- Sign up for a free Picovoice Console account and copy your AccessKey
. It handles authentication and authorization.
3- Create an instance of the Voice Activity Detection engine:
4- Pass in frames of audio to the .process
method:
For more information check Cobra Voice Activity Detection's product page or refer to Cobra's Python SDK quick start guide.
Porcupine Wake Word
1- Install the Porcupine Wake Word
SDK using PIP:
2- Sign up for a free Picovoice Console account and copy your AccessKey
. It handles authentication and authorization.
3- Create your custom wake word model using Picovoice Console.
4- Create an instance of the Wake Word engine:
5- Pass in frames of audio to the .process
method:
For more information check Porcupine Wake Words's product page or refer to Porcupine's Python SDK quick start guide.
Rhino Speech-to-Intent
1- Install the Rhino Speech-to-Intent
SDK using PIP:
2- Sign up for a free Picovoice Console account and copy your AccessKey
. It handles authentication and authorization.
3- Create your Context using Picovoice Console.
4- Create an instance of Rhino Speech-to-Intent to start recognizing voice commands within the domain of the provided context:
5- Pass in frames of audio to the .process
function and use the .get_inference
function to determine the user's intent:
For more information check Rhino Speech-to-Intent's product page or refer to Rhino's Python SDK quick start guide.
Cheetah Streaming Speech-to-Text
1- Install the Cheetah Streaming Speech-to-Text
SDK using PIP:
2- Sign up for a free Picovoice Console account and copy your AccessKey
. It handles authentication and authorization.
3- Create an instance of Cheetah to transcribe speech to text in real-time:
4- Pass in audio frames as they become available to the .process
function:
For more information check Cheetah Streaming Speech-to-Text's product page or refer to Cheetah's Python SDK quick start guide.
Leopard Speech-to-Text
1- Install the Leopard Speech-to-Text
SDK using PIP:
2- Sign up for a free Picovoice Console account and copy your AccessKey
. It handles authentication and authorization.
3- Create an instance of Leopard to transcribe speech to text:
4- Pass in an audio file to Leopard and inspect the result:
For more information, check Leopard Speech-to-Text's product page or refer to Leopard's Python SDK quick start guide.