Leopard Speech-to-Text
Go Quick Start

Platforms

Linux (x86_64)
macOS (x86_64, arm64)
Windows (x86_64)
Raspberry Pi (3, 4, 5)

Requirements

Picovoice Account & AccessKey
Go 1.16+
Windows only: a gcc compiler like Mingw in $PATH

Picovoice Account & AccessKey

Signup or Login to Picovoice Console to get your AccessKey. Make sure to keep your AccessKey secret.

Quick Start

Setup

Download and install Go language.
Install the Leopard Speech-to-Text Go Package using the Go CLI:

go get github.com/Picovoice/leopard/binding/go

Usage

Create an instance of the Leopard Speech-to-Text engine:

import . "github.com/Picovoice/leopard/binding/go"

leopard = NewLeopard("${ACCESS_KEY}") // AccessKey provided by Picovoice Console (https://console.picovoice.ai/)
err := leopard.Init()
if err != nil {
    // handle err init
}
defer leopard.Delete()

Transcribe an audio file:

transcript, words, err := leopard.ProcessFile("${AUDIO_FILE_PATH}")
if err != nil {
    // handle process error
}

When done be sure to explicitly release the resources using leopard.Delete().

Model File

The Leopard Speech-to-Text Go SDK comes preloaded with a default English language model (.pv file). Default models for other supported languages can be found in the Leopard Speech-to-Text GitHub repository.

Create custom language models using the Picovoice Console. Here you can train language models with custom vocabulary and boost words in the existing vocabulary.

Pass in the .pv file by setting .ModelPath on an instance of Leopard Speech-to-Text before initializing:

leopard := NewLeopard("${ACCESS_KEY}")
leopard.ModelPath = "${MODEL_PATH}"
err := leopard.Init()

Word Metadata

Along with the transcript, Leopard Speech-to-Text returns metadata for each transcribed word. Available metadata items are:

Start Time: Indicates when the word started in the transcribed audio. Value is in seconds.
End Time: Indicates when the word ended in the transcribed audio. Value is in seconds.
Confidence: Leopard Speech-to-Text's confidence that the transcribed word is accurate. It is a number within [0, 1].
Speaker Tag: If speaker diarization is enabled on initialization, the speaker tag is a non-negative integer identifying unique speakers, with 0 reserved for unknown speakers. If speaker diarization is not enabled, the value will always be -1.

Demo

For the Leopard Speech-to-Text Go SDK, we offer demo applications that demonstrate how to use the Speech-to-Text engine on audio files.

Setup

Clone the Leopard Speech-to-Text repository from GitHub using HTTPS:

git clone --recurse-submodules https://github.com/Picovoice/leopard.git

Usage

To see the usage options for the demos, use the -h flag:

cd leopard/demo/go
go run filedemo/leopard_file_demo.go -h

Run the following command to transcribe an audio file:

go run filedemo/leopard_file_demo.go -access_key "${ACCESS_KEY}" -input_audio_path "${AUDIO_PATH}"

For more information on our Leopard Speech-to-Text demos for Go, head over to our GitHub repository.

Resources

Package

leopard on pkg.go.dev

API

leopard Go API Docs

GitHub

Benchmark

Speech-to-Text Benchmark

Was this doc helpful?

Issue with this doc?

Leopard Speech-to-Text Go Quick Start

Platforms

Requirements

Picovoice Account & AccessKey

Quick Start

Setup

Usage

Model File

Word Metadata

Demo

Setup

Usage

Resources

Package

API

GitHub

Benchmark

Leopard Speech-to-Text
Go Quick Start