The end-to-end platform for embedding private voice AI into any software in a few lines of code
Picovoice end-to-end on-device voice AI platform consists of voice AI engines and models to empower enterprises to design, develop, and ship voice products without sacrificing user privacy or experience.
Picovoice end-to-end on-device platform features a complete set of modular voice AI engines delivered as cross-platform SDKs and a no-code platform to instantly train bespoke voice AI models to boost accuracy and efficiency.
Design with no limits on top of a modular platform. Create use-case-specific voice AI models in seconds.
Develop voice features with a few lines of code using intuitive and cross-platform SDKs.
Deliver voice AI everywhere: on-device, mobile, web browsers, on-premise, or cloud.
Measure adoption, learn, and iterate. Continuously re-design and re-train to optimize engagement.
Building accurate, responsive, and private voice technology is difficult.
We learned the hard way, so you don’t have to.
Picovoice heavily invests in R&D to offer superior voice AI that surpasses even Big Tech in accuracy and efficiency. Picovoice researchers do not follow recent frameworks and techniques but build them.
Picovoice empowers developers to prototype, build, and evangelize with no strings attached. Builders can start free without a limited trial, credit card, or endless sales meetings.
Picovoice returns control to enterprises as voice data never leaves the premises. Enterprises enjoy high accuracy without compromising privacy and reliability.
Everything to design, develop, and ship voice products: a complete set of modular voice AI engines delivered as cross-platform SDKs and a no-code platform to instantly train bespoke voice AI models to boost accuracy and efficiency.
Secure and flexible deployment with embedded, mobile, web browsers, on-premise, and cloud options. Expert help to choose the best deployment and platform for unique needs.
Picovoice ConsultingPerformant out-of-the-box voice AI models, proven by open-source benchmarks. Purpose-built models for specialized applications, use cases, domains, and industries.
Picovoice ConsultingEasy-to-follow docs covering 99% of questions and an active GitHub community addressing technical issues. Dedicated support through the Support Add-on and Enterprise Plans.
Support Add-onOn-device voice AI platform offers everything that developers need to design, develop, and ship voice products: a complete set of modular voice AI engines delivered as cross-platform SDKs and a no-code platform to instantly train bespoke voice AI models to boost accuracy and efficiency.
We recommend Cheetah Streaming Speech-to-Text for real-time conversations such as live events, conferences, and meetings, or enable note-taking and voice typing.
Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.
We recommend Leopard Speech-to-Text to convert audio and video files such as recordings of interviews, meetings, or calls, podcasts, and voicemails into text.
Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.
We recommend Koala Noise Suppression to achieve crisp and clear conversations by removing background noise and enhancing speech
Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.
We recommend Falcon Speaker Diarization for speaker diarization to make transcripts readable and analyzable.
Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.
We recommend Eagle Speaker Recognition to identify and verify speakers and personalize experiences simply by recognizing the user’s voice.
Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.
We recommend Orca Streaming Text-to-Speech to convert written text into spoken audio output.
Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.
We recommend Orca Streaming Text-to-Speech to convert streaming LLM text output into voice.
Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.
We recommend Porcupine Wake Word to detect wake words (Alexa), always listening commands (turn the lights on), and monitor conversations for specific keywords (product name).
Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.
We recommend Rhino Speech-to-Intent to add custom voice commands to software (set the brightness at 60%), create voicebots and IVRs, and navigate in menus (2022 Hyundai IONIQ 5 AWD)
Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.
We recommend Cobra Voice Activity Detection to detect when someone starts or stops speaking and trigger action accordingly.
Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.
We recommend Cobra Voice Activity Detection to detect and clean silence in audio and video data.
Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.
We recommend Octopus Speech-to-Index to make audio and video libraries discoverable to search for keywords, including proper nouns, and slang even without knowing the exact spelling.
Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.
We recommend Picovoice Voice Recorders to record and process audio files to create audio streams and use Picovoice Voice AI engines.
Please note that every use case is unique and the nuances may affect the performance of your product. If you’re a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you’re a Free Plan user, you can purchase an Enterprise Support Add-on to discuss your specific use case.
You can train voice AI models on the Picovoice Console . Picovoice Console is a no-code platform with a web-based type-and-train interface. You can create an account for the Picovoice Console account immediately and start building without engaging with the Picovoice team.
Before signing up for the Console, you can watch our tutorials to learn how to train custom voice AI models:
Yes, You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, Speech-to-Index) in the cloud.
Yes, You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, Speech-to-Index) on-prem.
Yes, You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, Speech-to-Index) in the serverless.
Yes, You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, Speech-to-Index) on mobile devices.
Yes, You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, Speech-to-Index) within web browsers.
Yes, You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, Speech-to-Index) on embedded devices.
No, Picovoice voice AI engines do not require a GPU.
Picovoice on-device Voice AI platform supports all modern SDKs. Android, Angular, C, .NET Flutter, Go, iOS, Java, Nodejs, Python, React, React Native, Rust, Unity, Vue, Web. If you need another SDK, you can check our open-source SDKs and build it yourself or contact Picovoice Consulting. Picovoice Consulting experts can create a public or private library for the SDK of your choice and maintain it.
Picovoice uses AccessKey, hence internet connectivity, to be able to offer its services according to your plan limits. Picovoice engines call home servers to validate the AccessKey and check your plan limits.
Picovoice tracks usage in the amount of data processed -in hours or characters, or the number of users depending on the engines and project setup.
Your usage automatically resets every 30 days.
Picovoice tracks the usage accumulated in the last 30 days. You can see the real-time consumption on your Picovoice Console Profile.
Downloaded models are counted toward your monthly model allowance. Once you hit download, your training usage will increase.
Picovoice tracks the training usage accumulated in the last 30 days. You can see the total number of models you trained in the last 30 days on your Picovoice Console Profile.
Yes, you can use Picovoice Voice AI engines for research, non-commercial, and commercial purposes as long as you are within your plan limits and compliant with the Picovoice Terms of Use.
Picovoice voice AI SDKs, voice recorders, and benchmarks are open-source and free to use.
To enable, data-driven decision-making and communicate its engines’ accuracy, Picovoice publishes open-source benchmarks for each engine. You can reproduce them or run them with your data.
Picovoice researchers continuously improve techniques and frameworks used to train algorithms. Picovoice applies transfer learning, hardware-aware training, and neural compression principles, resulting in efficient models competing with cloud-dependent AI models.
It depends on your tech stack and design. Given the number of engines Picovoice offers and the platforms it supports, it’s hard to communicate one number. We encourage developers to do their own tests and evaluations in their real environments.
Picovoice currently supports seventeen languages: English, Arabic, Dutch, Farsi, French, German, Hindi, Italian, Japanese, Korean, Mandarin, Polish, Portuguese, Russian, Spanish, Swedish, and Vietnamese. Please check the product page if you’re looking for engine-specific information. If you have an opportunity requiring another language, engage with Picovoice Consulting to get a custom model trained for you!
Yes, Picovoice technology works well across accents and dialects. The best way to learn about it is to test Picovoice engines with your dataset. Picovoice offers a Free Plan that allows enterprises to evaluate and become familiar with the technology, as well as a Developer Plan to run thorough tests before committing to an Enterprise Plan.
Picovoice aims to provide realistic benchmarks by leveraging various accents and noise. Yet, we encourage developers to test the engines in their real-world environments.
Picovoice engines expect audio with a 16kHz sampling rate. PSTN networks usually sample at 8kHz. It is possible to upsample but the frequency content above 4kHz is gone, and performance will be suboptimal. It is possible to train acoustic models for telephony applications for enterprise customers. Engage with Picovoice Consulting to find the best solution that works for you.
Picovoice software expects a 16kHz sampling rate. You will need to downsample. Typically, operating systems or sound cards (Audio codecs) provide such functionality; otherwise, you will need to implement it.
Picovoice software expects a 16kHz sampling rate, as it strikes a balance between quality and file size, used in voice commands and speech recognition technologies. At 16kHz, audio files are small enough to store and transmit while offering reasonable audio quality. Secondly, the human voice's most critical frequencies lie between 300Hz and 3400Hz. The Nyquist-Shannon sampling theorem states that a sampling rate of at least twice the highest frequency is required for accurate signal representation. 16kHz is more than twice 3400Hz and sufficient for processing the human voice. That’s why 16kHz has become a standard in applications using human speech and voice.
There are several factors that affect the performance of voice AI engines: quality of audio data, environment - noise, echo, reverberation, tech stack, and design.
You can leverage the self-service Picovoice Console to fine-tune voice AI models or engage with Picovoice Consulting for further improvement.
See how to fine-tune models on the Picovoice Console:
Custom speech recognition models are created for specific tasks, specific use cases, and sometimes for specific environments. General-purpose models are jacks-of-all-trades and masters-of-none. For example, if you need a medical dictation app, you need a fine-tuned speech-to-text to be able to capture the jargon correctly. If you’re building a sales enablement app, just like you train your salesforce to learn about your product names, you should adapt the general speech recognition model accordingly.
You can engage with Picovoice Consulting to discuss the opportunity.
Picovoice voice AI engines support the most popular and widely-used hardware and software out-of-the-box - from web, mobile, desktop, and on-prem to private cloud. However, there may be certain chipsets we do not currently support. (There are so many of them, yet only so much time and money, making it impossible to support everything.) You can engage with Picovoice Consulting and get any Picovoice voice AI engine ported to the platform of your choice.
Picovoice voice AI engines support the most popular and widely used SDKs. If you need another SDK, you can check our open-source SDKs and build it yourself or contact Picovoice Consulting. Picovoice Consulting experts can create a public or private library for the SDK of your choice and maintain it.
Picovoice currently supports seventeen languages: English, Arabic, Dutch, Farsi, French, German, Hindi, Italian, Japanese, Korean, Mandarin, Polish, Portuguese, Russian, Spanish, Swedish, and Vietnamese. Please check the product page if you’re looking for engine-specific information. If you have an opportunity requiring another language, engage with Picovoice Consulting to get a custom model trained for you!
Picovoice engines have a lexicon of hundreds of thousands of words in their lexicons. However, there might be some special words we missed. You can add a custom word to Leopard Speech-to-Text and Cheetah Streaming Speech-to-Text on the self-service Picovoice Console. In order to add a new word to the Porcupine Wake Word and Rhino Speech-to-Intent lexicon, if you’re a Picovoice customer, reach out to your Picovoice contact, if you’re not a customer, why don’t you become one?
You can create a GitHub issue under the relevant repository/demo.
Enterprises face several challenges while building PoCs. Finding a talent experienced in machine learning is one of the biggest challenges to start with. We learned this the hard way, and experience it every day. On top of it, executives and clients may have unrealistic deadlines.
Experts at Picovoice Consulting help enterprises build PoCs, develop their AI strategy, and work with them hand-in-hand offering the guidance they need.
Picovoice voice AI engines process data in your environment, whether it’s public or private cloud, on-prem, web, mobile, desktop, or embedded.
Picovoice is private by design and has no access to user data. Thus, Picovoice doesn’t retain user data as it never tracks or stores them in the first place.
Yes. Enterprises using Picovoice don’t need to share their user data with Picovoice or any other 3rd party to run voice AI models, making Picovoice voice AI platform intrinsically HIPAA-compliant.
Yes. Enterprises using Picovoice don’t need to share their user data with Picovoice or any other 3rd party to run voice AI models, making Picovoice voice AI platform intrinsically GDPR-compliant.
Yes. Enterprises using Picovoice don’t need to share their user data with Picovoice or any other 3rd party to run voice AI models, making Picovoice voice AI platform intrinsically CCPA-compliant.
Yes, you can use voice AI with local LLMs and create private, accurate, and reliable AI agents.
The answer is “it depends”. Voice AI is complex technology and building products for production requires diligent work. It depends on your use case, other tools, and the tech stack used, along with hardware and software choice. Given the variables, it can be challenging. You can experiment different scenarios leveraging Picovoice’s Free resources or engage with experts from Picovoice Consulting to find the best approach to deploying language models for production.