Leopard Speech-to-Text

Transcribe voice data locally with cloud-level accuracy

Adaptive, efficient, cost-effective, and no compromises

Start Building for Free
>
>

Adapt the default model to your domain

Train an optimized speech-to-text model by adding custom vocabulary and boosting words relevant to your use case. Simply type or bulk import words to achieve the highest possible accuracy. Test within browser and download models for local transcription.

Console User Interface showing “custom vocabulary” and “word boost” features to improve Leopard Speech-to-Text accuracy.
Start Building

Build with Your Favorite SDK

Add Leopard Speech-to-Text with your favourite SDK, including Python, NodeJS, Android, iOS, and React Native. Transcribing speech data locally on a device is just a few lines of code away!

o = pvleopard.create(access_key)
transcript, words =
o.process_file(path)
Build with Python
const o = new Leopard(accessKey)
const { transcript, words } =
o.processFile(path)
Build with NodeJS
Leopard o = new Leopard.Builder()
.setAccessKey(accessKey)
.setModelPath(modelPath)
.build(appContext);
LeopardTranscript r =
o.processFile(path);
Build with Android
let o = Leopard(
accessKey: accessKey,
modelPath: modelPath)
let r = o.processFile(path)
Build with iOS
o = NewLeopard(accessKey)
err := o.Init()
transcript, words, err
:= o.ProcessFile(path)
Build with Go
Leopard o = new Leopard.Builder()
.setAccessKey(accessKey)
.build();
LeopardTranscript r =
o.processFile(path);
Build with Java
Leopard o =
Leopard.Create(accessKey);
LeopardTranscript result =
o.ProcessFile(path);
Build with .NET
let o: Leopard =
LeopardBuilder::new()
.access_key(access_key)
.init()
.expect("");
if let Ok(result) =
o.process_file(path) { }
Build with Rust
Leopard o = await Leopard.create(
accessKey,
modelPath);
LeopardTranscript result =
await o.processFile(path);
Build with Flutter
const o = await Leopard.create(
accessKey,
modelPath)
const {transcript, words} =
await o.processFile(path)
Build with React Native
pv_leopard_t *leopard = NULL;
pv_leopard_init(
access_key,
model_path,
enable_automatic_punctuation,
&leopard);
char *transcript = NULL;
int32_t num_words = 0;
pv_word_t *words = NULL;
pv_leopard_process_file(
leopard,
path,
&transcript,
&num_words,
&words);
Build with C
const leopard =
await LeopardWorker.
fromPublicDirectory(
accessKey,
modelPath
);
const {
transcript,
words
} =
await leopard.process(pcm);
Build with Web

Deploy Efficient Speech-to-Text Models Anywhere

Deploy Leopard Speech-to-Text anywhere. Offer seamless experiences across platforms such as web browsers, mobile devices, on-prem servers, or all. Leopard brings state-of-the-art voice recognition to where data resides.

Grow with Confidence

Transcribe more for less! Leopard is not affordable by percentage, but by a factor of 10 to 20. Transcribe all your data instead of just a portion due to STT costs. Check out the cost comparison of the most-known speech-to-text engines: Amazon Transcribe, Azure, Google Speech-to-Text, IBM Watson and Leopard.

Start with the Free Tier

Why Leopard Speech-to-Text?

Accurate — backed by open-source benchmark, not fancy graphs

Let the data decide what’s accurate. No voice AI vendor claims “mediocre accuracy.” At least we haven’t seen it. However, accuracy depends on various factors. More importantly, if an Automatic Speech Recognition engine with 90% accuracy, i.e. 10% Word Error Rate, misses the most important words, that accuracy doesn’t mean much. That is why we’ve published an open-source benchmark. It compares Leopard’s accuracy against Amazon Transcribe, Microsoft Azure, Google Speech-to-Text, and IBM Watson.

ASR accuracy chart shows Leopard outperforms Google Speech-to-Text & IBM Watson, slightly behind Amazon Transcribe and Azure.

Fast — No Downtime, Zero Latency

Do not hinder user experience due to network outages or latency. Leopard’s edge-first architecture ensures reliable processing time by cutting the connectivity dependency. Unlike cloud APIs, Leopard Speech-to-Text offers on-device voice recognition.

Private — intrinsically compliant with GDPR, HIPAA and more!

Do not lose control over your data! Leopard processes voice data on the device without sending it to a 3rd party cloud. Sending the data to the cloud is risky, especially in highly regulated industries such as healthcare and financial services. Google Speech-to-Text charges 50% extra if you don’t share your data with Google. Even if the audio recordings are not stored, transmission over the Internet is still a risk factor. Data is interceptable on route to the 3rd party cloud unless it’s encrypted and sent over a secure connection. The controversial privacy policies of the big tech are very well known. However, they are not alone. As alternative transcription providers grow, just like Otter.ai, privacy flaws of cloud-dependent transcription become more visible.

Learn more about Leopard Speech-to-Text Engine

  • Does Leopard Speech-to-Text support real-time transcription?

    Leopard doesn’t, but Cheetah does. Cheetah is Picovoice’s on-device streaming speech-to-text engine that provides text output in real-time for visual feedback. Like Leopard, Cheetah is also private, fast and cost-effective as voice data is processed locally on the device. Cheetah also runs across platforms, thanks to its compact and computationally efficient model. Although Cheetah is less accurate than Leopard, it outperforms IBM Watson and Google Speech-to-Text. Learn more about Cheetah Speech-to-Text.

  • Why do you offer local speech-to-text with on-device voice processing instead of speech-to-text APIs?

    Achieving state-of-the-art accuracy within the limitations of the edge is a challenging problem. Large models require more resources, which could be offered only by the cloud. However, relying on the cloud comes with costs such as hefty bills at the scale, privacy and environmental costs with productivity losses due to network outages or delays. Picovoice aims to be the developer’s first choice for adding voice to anything. We developed local speech-to-text to give the control back to you. So you can add voice to anything, on your terms, with no compromises.

  • Can I use Picovoice Speech-to-Text in the cloud?

    Yes! By processing voice data locally on the device, Picovoice brings the voice recognition technology close to where data resides instead of sending the data to where the processing happens. One can process voice data with Picovoice Speech-to-Text in private, public or hybrid cloud. Don’t forget to check out tutorials for serverless speech-to-text with Leopard & AWS Lambda and microservice with Leopard and gRPC for inspiration.

  • How do I choose the best speech-to-text for my project?

    “Best” is a subjective term. Every use case has different business requirements. Also, the available resources vary from one enterprise to another. Accuracy, availability of features, time-to-market, the total cost of ownership, data privacy and governance, and reliability of the providers are some factors to consider, and they do not have the same weight for everyone. Read our guidelines to evaluate top FOSS (free and open-source) and production-grade speech-to-text solutions.

  • Which platforms does Picovoice Speech-to-Text support?

    1. Desktop and Servers: Linux, macOS, and Windows
    2. Web Browsers: Chrome, Safari, Firefox, and Edge
    3. Mobile Devices: Android and iOS
    4. Single Board Computers: Raspberry Pi and NVIDIA Jetson
  • What can I build with Picovoice Speech-to-Text?

    Speech-to-text is the most known, widely available speech recognition technology. Picovoice customers use it in:
    • Transcription
    • Dictation (voice typing)
    • Adding closed caption & subtitles
    • Speech analytics and intelligence
    Even notorious voice assistants such as Alexa and Siri use speech-to-text following a wake word. Don’t forget to check out Voice Search, Voice Command and Control, Search by Voice and Speech Analytics use cases to learn more!
  • Which languages does Picovoice Speech-to-Text support?

    Picovoice Speech-to-Text only supports the English language for now.

  • What should I do if I need support for other languages?

    Reach out to Picovoice Sales to tell us about your commercial endeavour. Don’t forget the add the use case, business requirements and project details. Picovoice team will respond to you.

  • How do I get technical support for Leopard Speech-to-Text?

    Picovoice docs, blogs, Medium posts, and GitHub are great resources to learn about voice recognition, Picovoice engines, and how to start building transcription products. Picovoice also offers GitHub community support to all Free Tier users.

  • How can I get informed about the updates and upgrades?

    Version changes appear in the Picovoice Newsletter, LinkedIn, and Twitter. Subscribing to GitHub is the best way to get notified of the patch releases. If you enjoy building with Leopard, don’t forget to give it a star when you’re on GitHub!