Design, develop, and ship useful voice features.

The end-to-end platform for embedding private voice AI into any software in a few lines of code

Trusted by thousands of enterprises - from startups to Fortune 500s

Loved by 200,000+ developers

What is the end-to-end on-device voice AI platform?

Picovoice end-to-end on-device voice AI platform consists of voice AI engines and models to empower enterprises to design, develop, and ship voice products without sacrificing user privacy or experience.

Picovoice end-to-end on-device platform features a complete set of modular voice AI engines delivered as cross-platform SDKs and a no-code platform to instantly train bespoke voice AI models to boost accuracy and efficiency.

End-to-end voice AI platform

Design

Design with no limits on top of a modular platform. Create use-case-specific voice AI models in seconds.

Develop

Develop voice features with a few lines of code using intuitive and cross-platform SDKs.

Ship

Deliver voice AI everywhere: on-device, mobile, web browsers, on-premise, or cloud.

Iterate

Measure adoption, learn, and iterate. Continuously re-design and re-train to optimize engagement.

Offerings

picoLLM

LLM Quantization & Inference

End-to-end platform compresses any LLM without sacrificing accuracy and runs across Linux, macOS, Windows, Android, iOS, Chrome, Safari, Edge, Firefox, Raspberry Pi, supporting both CPU and GPU.

Start Building Learn More

Model used: Phi-2

Hello, Phi-2!

Hello! Start the demo to begin a conversation.

Why choose Picovoice?

Building accurate, responsive, and private voice technology is difficult.
We learned the hard way, so you don’t have to.

Innovative

Picovoice heavily invests in R&D to offer superior voice AI that surpasses even Big Tech in accuracy and efficiency. Picovoice researchers do not follow recent frameworks and techniques but build them.

Developer-first

Picovoice empowers developers to prototype, build, and evangelize with no strings attached. Builders can start free without a limited trial, credit card, or endless sales meetings.

Private

Picovoice returns control to enterprises as voice data never leaves the premises. Enterprises enjoy high accuracy without compromising privacy and reliability.

Don't just take our word!

Put us to the test with a Forever-Free Account

Start Free

Speech-to-Text

A transcription engine that automatically converts audio and video recordings into text with high accuracy without sacrificing privacy.

Leopard Speech-to-Text

Streaming Speech-to-Text

A real-time transcription engine that automatically converts conversations into text with zero latency.

Cheetah Streaming Speech-to-Text

Noise Suppression and Cancellation

Noise cancellation software that removes background noise from audio in real time while preserving human speech

Koala Noise Suppression

Speaker Recognition and Identification

Speaker recognition and identification software that distinguishes individuals using their unique voice characteristics.

Eagle Speaker Recognition

Falcon Speaker Diarization

A speaker diarization engine that identifies “who spoke when” in an audio stream by finding speaker changes and grouping them.

Falcon Speaker Diarization

Speech-to-Index

A search engine that indexes speech directly without converting it into text, enabling keyword and phrase search within audio and video files.

Octopus Speech-to-Index

Wake Word Detection

A wake word detection engine that recognizes unique signals to transition software from passive to active listening.

Porcupine Wake Word

Speech-to-Intent

Natural Language Understanding engine fused with speech-to-text, allowing users to interact with applications via voice commands.

Rhino Speech-to-Intent

Voice Activity Detection

Voice activity detection (VAD) software scans audio streams to identify the presence of human speech in real time.

Cobra Voice Activity Detection

Text-to-Speech

A voice generator that converts written text into spoken audio output without network latency or jeopardizing user privacy.

Orca Text-to-Speech

The Enterprise Voice AI

Secure and Flexible Deployment

Secure and flexible deployment with embedded, mobile, web browsers, on-premise, and cloud options. Expert help to choose the best deployment and platform for unique needs.

Picovoice Consulting

Customizable Voice AI Models

Performant out-of-the-box voice AI models, proven by open-source benchmarks. Purpose-built models for specialized applications, use cases, domains, and industries.

Picovoice Consulting

Dedicated Support

Easy-to-follow docs covering 99% of questions and an active GitHub community addressing technical issues. Dedicated support through the Support Add-on and Enterprise Plans.

Support Add-on

The Edge Voice AI Platform

Private, reliable and powerful
voice products

Start Free

How to Create Subtitles for any Video with Python

Speech Recognition on Raspberry Pi

How to Record Audio using Python

Speech-to-Text using Node.js

Python Wake Word Detection Tutorial

How to Record Audio from a Web Browser

Feature

On-device voice AI platform offers everything that developers need to design, develop, and ship voice products: a complete set of modular voice AI engines delivered as cross-platform SDKs and a no-code platform to instantly train bespoke voice AI models to boost accuracy and efficiency.

We recommend Cheetah Streaming Speech-to-Text for real-time conversations such as live events, conferences, and meetings, or enable note-taking and voice typing.

Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.

We recommend Leopard Speech-to-Text to convert audio and video files such as recordings of interviews, meetings, or calls, podcasts, and voicemails into text.

We recommend Koala Noise Suppression to achieve crisp and clear conversations by removing background noise and enhancing speech

We recommend Falcon Speaker Diarization for speaker diarization to make transcripts readable and analyzable.

We recommend Eagle Speaker Recognition to identify and verify speakers and personalize experiences simply by recognizing the user’s voice.

We recommend Orca Streaming Text-to-Speech to convert written text into spoken audio output.

We recommend Orca Streaming Text-to-Speech to convert streaming LLM text output into voice.

We recommend Porcupine Wake Word to detect wake words (Alexa), always listening commands (turn the lights on), and monitor conversations for specific keywords (product name).

We recommend Rhino Speech-to-Intent to add custom voice commands to software (set the brightness at 60%), create voicebots and IVRs, and navigate in menus (2022 Hyundai IONIQ 5 AWD)

We recommend Cobra Voice Activity Detection to detect when someone starts or stops speaking and trigger action accordingly.

We recommend Cobra Voice Activity Detection to detect and clean silence in audio and video data.

We recommend Octopus Speech-to-Index to make audio and video libraries discoverable to search for keywords, including proper nouns, and slang even without knowing the exact spelling.

We recommend Picovoice Voice Recorders to record and process audio files to create audio streams and use Picovoice Voice AI engines.

You can train voice AI models on the Picovoice Console . Picovoice Console is a no-code platform with a web-based type-and-train interface. You can create an account for the Picovoice Console account immediately and start building without engaging with the Picovoice team.

Before signing up for the Console, you can watch our tutorials to learn how to train custom voice AI models:

Usage

Desktop & Server: Linux, Windows & macOS
Mobile: Android & iOS
Web Browsers: Chrome, Safari, Edge and Firefox
Single Board Computers: Raspberry Pi
Cloud Providers: AWS, Azure, Google, IBM, Oracle, and others.

Yes, You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, Speech-to-Index) in the cloud.

No, Picovoice voice AI engines do not require a GPU.

Picovoice on-device Voice AI platform supports all modern SDKs. Android, Angular, C, .NET Flutter, Go, iOS, Java, Nodejs, Python, React, React Native, Rust, Unity, Vue, Web. If you need another SDK, you can check our open-source SDKs and build it yourself or contact Picovoice Consulting. Picovoice Consulting experts can create a public or private library for the SDK of your choice and maintain it.

Picovoice uses AccessKey, hence internet connectivity, to be able to offer its services according to your plan limits. Picovoice engines call home servers to validate the AccessKey and check your plan limits.

Picovoice tracks usage in the amount of data processed -in hours or characters, or the number of users depending on the engines and project setup.

Your usage automatically resets every 30 days.

Picovoice tracks the usage accumulated in the last 30 days. You can see the real-time consumption on your Picovoice Console Profile.

Downloaded models are counted toward your monthly model allowance. Once you hit download, your training usage will increase.

Picovoice tracks the training usage accumulated in the last 30 days. You can see the total number of models you trained in the last 30 days on your Picovoice Console Profile.

Yes, you can use Picovoice Voice AI engines for research, non-commercial, and commercial purposes as long as you are within your plan limits and compliant with the Picovoice Terms of Use.

Technical Questions

Picovoice voice AI SDKs, voice recorders, and benchmarks are open-source and free to use.

To enable, data-driven decision-making and communicate its engines’ accuracy, Picovoice publishes open-source benchmarks for each engine. You can reproduce them or run them with your data.

Picovoice researchers continuously improve techniques and frameworks used to train algorithms. Picovoice applies transfer learning, hardware-aware training, and neural compression principles, resulting in efficient models competing with cloud-dependent AI models.

It depends on your tech stack and design. Given the number of engines Picovoice offers and the platforms it supports, it’s hard to communicate one number. We encourage developers to do their own tests and evaluations in their real environments.

Picovoice currently supports seventeen languages: English, Arabic, Dutch, Farsi, French, German, Hindi, Italian, Japanese, Korean, Mandarin, Polish, Portuguese, Russian, Spanish, Swedish, and Vietnamese. Please check the product page if you’re looking for engine-specific information. If you have an opportunity requiring another language, engage with Picovoice Consulting to get a custom model trained for you!

Yes, Picovoice technology works well across accents and dialects. The best way to learn about it is to test Picovoice engines with your dataset. Picovoice offers a Free Plan that allows enterprises to evaluate and become familiar with the technology, as well as a Developer Plan to run thorough tests before committing to an Enterprise Plan.

Picovoice aims to provide realistic benchmarks by leveraging various accents and noise. Yet, we encourage developers to test the engines in their real-world environments.

Picovoice engines expect audio with a 16kHz sampling rate. PSTN networks usually sample at 8kHz. It is possible to upsample but the frequency content above 4kHz is gone, and performance will be suboptimal. It is possible to train acoustic models for telephony applications for enterprise customers. Engage with Picovoice Consulting to find the best solution that works for you.

Picovoice software expects a 16kHz sampling rate. You will need to downsample. Typically, operating systems or sound cards (Audio codecs) provide such functionality; otherwise, you will need to implement it.

Picovoice software expects a 16kHz sampling rate, as it strikes a balance between quality and file size, used in voice commands and speech recognition technologies. At 16kHz, audio files are small enough to store and transmit while offering reasonable audio quality. Secondly, the human voice's most critical frequencies lie between 300Hz and 3400Hz. The Nyquist-Shannon sampling theorem states that a sampling rate of at least twice the highest frequency is required for accurate signal representation. 16kHz is more than twice 3400Hz and sufficient for processing the human voice. That’s why 16kHz has become a standard in applications using human speech and voice.

There are several factors that affect the performance of voice AI engines: quality of audio data, environment - noise, echo, reverberation, tech stack, and design.

Custom Models & Support

You can leverage the self-service Picovoice Console to fine-tune voice AI models or engage with Picovoice Consulting for further improvement.

See how to fine-tune models on the Picovoice Console:

Custom speech recognition models are created for specific tasks, specific use cases, and sometimes for specific environments. General-purpose models are jacks-of-all-trades and masters-of-none. For example, if you need a medical dictation app, you need a fine-tuned speech-to-text to be able to capture the jargon correctly. If you’re building a sales enablement app, just like you train your salesforce to learn about your product names, you should adapt the general speech recognition model accordingly.

You can engage with Picovoice Consulting to discuss the opportunity.

Picovoice voice AI engines support the most popular and widely-used hardware and software out-of-the-box - from web, mobile, desktop, and on-prem to private cloud. However, there may be certain chipsets we do not currently support. (There are so many of them, yet only so much time and money, making it impossible to support everything.) You can engage with Picovoice Consulting and get any Picovoice voice AI engine ported to the platform of your choice.

Picovoice voice AI engines support the most popular and widely used SDKs. If you need another SDK, you can check our open-source SDKs and build it yourself or contact Picovoice Consulting. Picovoice Consulting experts can create a public or private library for the SDK of your choice and maintain it.

Picovoice engines have a lexicon of hundreds of thousands of words in their lexicons. However, there might be some special words we missed. You can add a custom word to Leopard Speech-to-Text and Cheetah Streaming Speech-to-Text on the self-service Picovoice Console. In order to add a new word to the Porcupine Wake Word and Rhino Speech-to-Intent lexicon, if you’re a Picovoice customer, reach out to your Picovoice contact, if you’re not a customer, why don’t you become one?

You can create a GitHub issue under the relevant repository/demo.

Enterprises face several challenges while building PoCs. Finding a talent experienced in machine learning is one of the biggest challenges to start with. We learned this the hard way, and experience it every day. On top of it, executives and clients may have unrealistic deadlines.

Experts at Picovoice Consulting help enterprises build PoCs, develop their AI strategy, and work with them hand-in-hand offering the guidance they need.

Data Security & Privacy

Picovoice voice AI engines process data in your environment, whether it’s public or private cloud, on-prem, web, mobile, desktop, or embedded.

Picovoice is private by design and has no access to user data. Thus, Picovoice doesn’t retain user data as it never tracks or stores them in the first place.

Yes. Enterprises using Picovoice don’t need to share their user data with Picovoice or any other 3rd party to run voice AI models, making Picovoice voice AI platform intrinsically HIPAA-compliant.

Yes. Enterprises using Picovoice don’t need to share their user data with Picovoice or any other 3rd party to run voice AI models, making Picovoice voice AI platform intrinsically GDPR-compliant.

Yes. Enterprises using Picovoice don’t need to share their user data with Picovoice or any other 3rd party to run voice AI models, making Picovoice voice AI platform intrinsically CCPA-compliant.

Building with Picovoice

Yes, you can use voice AI with local LLMs and create private, accurate, and reliable AI agents.

The answer is “it depends”. Voice AI is complex technology and building products for production requires diligent work. It depends on your use case, other tools, and the tech stack used, along with hardware and software choice. Given the variables, it can be challenging. You can experiment different scenarios leveraging Picovoice’s Free resources or engage with experts from Picovoice Consulting to find the best approach to deploying language models for production.

Yes! Picovoice engines are modular and work with other Picovoice products or competitive products. Check Picovoice Blog or GitHub to find more information, tutorials, and demos.

Design, develop, and ship useful voice features.

What is the end-to-end on-device voice AI platform?

End-to-end voice AI platform

Design

Develop

Ship

Iterate

Offerings

Why choose Picovoice?

Innovative

Developer-first

Private

Don't just take our word!

What does Picovoice offer?

Speech-to-Text

Streaming Speech-to-Text

Noise Suppression and Cancellation

Speaker Recognition and Identification

Falcon Speaker Diarization

Speech-to-Index

Wake Word Detection

Speech-to-Intent

Voice Activity Detection

Text-to-Speech

The Enterprise Voice AI

Secure and Flexible Deployment

Customizable Voice AI Models

Dedicated Support

Private, reliable and powerfulvoice products

More from Picovoice

How to Create Subtitles for any Video with Python

Speech Recognition on Raspberry Pi

How to Record Audio using Python

Speech-to-Text using Node.js

Python Wake Word Detection Tutorial

How to Record Audio from a Web Browser

FAQ

Feature

1. What does Picovoice on-device Voice AI Platform offer?

2. What should I use to transcribe real-time conversations such as live events, conferences, and meetings, or enable note-taking and voice typing?

3. What should I use to convert audio and video files such as recordings of interviews, meetings, calls, podcasts, and voicemails into text?

4. What should I use to achieve crisp and clear conversations by removing background noise and enhancing speech?

5. What should I use to diarize speakers in conversations to make transcripts readable and analyzable?

6. What should I use to identify and verify speakers, and personalize experiences simply by recognizing the user’s voice?

7. What should I use to convert written text into spoken audio output?

8. What should I use to add voice to an LLM-powered application to build an AI agent?

9. What should I use to detect wake words, always listening commands, and monitor conversations for specific keywords?

10. What should I use to add custom voice commands to software, create voicebots and IVRs, and navigate menus?

11. What should I use to activate software when someone starts or stops speaking?

12. What should I do to detect and clean silence in audio and video data?

13. What should I use to make audio and video libraries discoverable to search for keywords, including proper nouns, and slang even without knowing the exact spelling?

14. What should I use to record and process audio files to use Picovoice voice AI engines?

15. How can I train custom voice AI models?

Usage

1. What are the hardware and software platforms supported by Picovoice voice AI engines?

2. Do Picovoice voice AI engines run in the cloud?

3. Do Picovoice voice AI engines run on-prem?

4. Do Picovoice voice AI engines run in the serverless?

5. Do Picovoice voice AI engines run on mobile devices?

6. Do Picovoice voice AI engines run within web browsers?

7. Do Picovoice voice AI engines run on embedded devices?

8. Do Picovoice voice AI engines need a GPU?

9. Which SDKs are supported by Picovoice Voice AI?

10. Why do Picovoice Voice AI engines need an AccessKey (i.e., internet connectivity) if engines process data offline?

11. How does Picovoice track Voice AI engine usage?

12. When does my Voice AI engine usage reset?

13. How does Picovoice track Voice AI model training?

14. When does my Voice AI model training reset?

15. Can I use Picovoice Voice AI engines for research, non-commercial or commercial purposes?

Technical Questions

1. Is Picovoice open-source?

2. How accurate is Picovoice?

3. How are Picovoice’s small voice AI models more accurate than large, cloud-dependent AI models?

4. How fast is Picovoice?

5. Which languages does Picovoice support?

6. Does Picovoice technology work across various accents and dialects?

7. Can I use Picovoice software for telephony applications?

8. My audio source is 48kHz/44.1kHz. Does Picovoice software support that?

9. What’s the 16kHz sampling rate?

10. What are the other factors that affect the performance of voice AI engines?

Private, reliable and powerful
voice products