Cheetah Speech-to-Text
Node.js API
API Reference for the Node.js Cheetah SDK (npm)
Cheetah
Class for the Cheetah Speech-to-Text engine.
Cheetah can be initialized using the class constructor().
Resources should be cleaned when you are done using the release() method.
Cheetah.constructor()
Cheetah constructor.
Parameters
accessKeystring : AccessKey obtained from Picovoice Console.optionsCheetahOptions: Optional configuration arguments:modelPathstring : Path to the file containing model parameters (.pv).devicestring? : String representation of the device (e.g., CPU or GPU) to use for inference. If set tobest, picoLLM picks the most suitable device. If set togpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}, where${GPU_INDEX}is the index of the target GPU. If set tocpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}, where${NUM_THREADS}is the desired number of threads.libraryPathstring : Path to the Cheetah dynamic library (.node).endpointDurationnumber : Duration of endpoint in seconds. A speech endpoint is detected when there is a chunk of audio (with a duration specified herein) after an utterance without any speech in it. Set duration to 0 to disable this. Default is 1 second.enableAutomaticPunctuationboolean : Whether to enable automatic punctuation. Default is false.
Returns
Cheetah: An instance of Cheetah platform.
Cheetah.release()
Releases resources acquired by Cheetah.
Cheetah.frameLength
Getter for number of audio samples per frame.
Returns
number: Number of audio samples per frame.
Cheetah.sampleRate
Getter for audio sample rate accepted by Cheetah.
Returns
number: Audio sample rate accepted by Cheetah.
Cheetah.version
Getter for version.
Returns
string: CurrentCheetahversion.
Cheetah.listAvailableDevices()
Lists all available devices that Cheetah can use for inference. Each entry in the list can be the device argument of the constructor.
Parameters
optionsCheetahInputOptions : Optional input configuration arguments.
Returns
- string[] : List of all available devices that Cheetah can use for inference.
CheetahOptions
Cheetah init options type.
modelPathstring : The path to the Cheetah model (.pv).devicestring? : String representation of the device (e.g., CPU or GPU) to use for inference. If set tobest, picoLLM picks the most suitable device. If set togpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}, where${GPU_INDEX}is the index of the target GPU. If set tocpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}, where${NUM_THREADS}is the desired number of threads.endpointDurationSecnumber : Duration of endpoint in seconds. A speech endpoint is detected when there is a chunk of audio (with a duration specified herein) after an utterance without any speech in it. Set duration to 0 to disable this. Default is 1 second.enableAutomaticPunctuationboolean : Flag to enable automatic punctuation insertion.
CheetahInputOptions
Cheetah input options type.
libraryPathstring : The path to the Cheetah dynamic library.
Cheetah.process()
Processes a frame of the incoming audio stream with the speech-to-text engine. The number of samples per frame can be attained by calling .frameLength. The incoming audio needs to have a sample rate equal to .sampleRate and be 16-bit linearly-encoded. Cheetah operates on single-channel audio.
Parameters
pcmArray<number> : A frame of audio samples.
Returns
[string, boolean]: Transcription of any newly-transcribed speech (if none is available then an empty string is returned) and a flag indicating if an endpoint has been detected.
Cheetah.flush()
Marks the end of the audio stream, flushes internal state of the object, and returns any remaining transcribed text.
Returns
string: Any remaining transcribed text. If none is available then an empty string is returned.
Errors
Exceptions thrown if an error occurs within Cheetah Speech-to-Text engine.
Exceptions: