Leopard Speech-to-Text
Java Quick Start

Platforms

Linux (x86_64)
macOS (x86_64, arm64)
Windows (x86_64, arm64)
Raspberry Pi (3, 4, 5)

Requirements

Picovoice Account and AccessKey
Java 11+

Picovoice Account & AccessKey

Signup or Login to Picovoice Console to get your AccessKey. Make sure to keep your AccessKey secret.

Quick Start

Setup

Install JDK 11+.
Install the Java binding from the Maven Central Repository at:

ai.picovoice:leopard-java:${version}

Usage

Create an instance of the engine with the Leopard Speech-to-Text Builder class and transcribe an audio file:

import ai.picovoice.leopard.*;

final String accessKey = "${ACCESS_KEY}"; // AccessKey provided by Picovoice Console (https://console.picovoice.ai/)

try {
    Leopard leopard = new Leopard.Builder()
        .setAccessKey(accessKey).build();
    LeopardTranscript result = leopard.processFile("${AUDIO_PATH}");
    leopard.delete();
} catch (LeopardException ex) { }

System.out.println(transcript);

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console and ${AUDIO_PATH} to the path an audio file. Finally, when done be sure to explicitly release the resources using leopard.delete().

leopard.delete()

Model File

The Leopard Speech-to-Text Java SDK comes preloaded with a default English language model (.pv file). Default models for other supported languages can be found in the Leopard Speech-to-Text GitHub repository.

Create custom language models using the Picovoice Console. Here you can train language models with custom vocabulary and boost words in the existing vocabulary.

Pass in the .pv file via the .setModelPath() Builder argument:

Leopard leopard = new Leopard.Builder()
        .setAccessKey("${ACCESS_KEY}")
        .setModelPath("${MODEL_PATH")
        .build();

Word Metadata

Along with the transcript, Leopard Speech-to-Text returns metadata for each transcribed word. Available metadata items are:

Start Time: Indicates when the word started in the transcribed audio. Value is in seconds.
End Time: Indicates when the word ended in the transcribed audio. Value is in seconds.
Confidence: Leopard Speech-to-Text's confidence that the transcribed word is accurate. It is a number within [0, 1].
Speaker Tag: If speaker diarization is enabled on initialization, the speaker tag is a non-negative integer identifying unique speakers, with 0 reserved for unknown speakers. If speaker diarization is not enabled, the value will always be -1.

Demo

For the Leopard Speech-to-Text Java SDK, we offer demo applications that demonstrate how to use the Speech-to-Text engine on audio files.

Setup

Clone the Leopard Speech-to-Text repository from GitHub using HTTPS:

git clone --recurse-submodules https://github.com/Picovoice/leopard.git

Build the Leopard Speech-to-Text Java demo using Gradle:

cd leopard/demo/java
./gradlew build

Usage

To see the usage options for the demos, use the -h flag:

java -jar build/libs/leopard-file-demo.jar -h

Run the following command to transcribe an audio file:

java -jar build/libs/leopard-file-demo.jar -a ${ACCESS_KEY} -i ${AUDIO_PATH}

For more information on our Leopard Speech-to-Text demos for Java, head over to our GitHub repository.

Resources

Package

leopard-java on Maven Central

API

leopard-java API Docs

GitHub

Benchmark

Speech-to-Text Benchmark

Was this doc helpful?

Issue with this doc?

Leopard Speech-to-Text Java Quick Start

Platforms

Requirements

Picovoice Account & AccessKey

Quick Start

Setup

Usage

Model File

Word Metadata

Demo

Setup

Usage

Resources

Package

API

GitHub

Benchmark

Leopard Speech-to-Text
Java Quick Start