Picovoice AI Frequently Asked Questions
Find answers to frequently asked questions on the Picovoice on-device Voice AI and local LLM platforms, Console, and Pricing. For software-specific questions, please refer to the dedicated FAQs at the bottom of each product page:
Local LLM:
On-device Voice AI:
- Leopard Speech-to-Text
- Cheetah Streaming Speech-to-Text
- Koala Noise Suppression
- Eagle Speaker Recognition
- Falcon Speaker Diarization
- Orca Text-to-Speech
- Porcupine Wake Word
- Rhino Speech-to-Intent
- Cobra Voice Activity Detection
FAQ
Picovoice sells its proprietary voice AI and LLM technology to enable enterprises to build AI-powered products without sacrificing privacy or accuracy in a few lines of code.
Picovoice's subscription model:
- Offers access to support, updates, and upgrades during the engagement.
- Helps enterprises manage their working capital effectively.
- Automizes usage tracking, resulting in efficiency gains and cost savings.
You can find more information about Picovoice's introductory packages on the pricing page.
Picovoice offers highly accurate and lightweight on-device AI engines using deep neural networks trained in real-world environments.
Picovoice's proprietary algorithms are developed by Picovoice researchers using transfer learning and hardware-aware training principles. Transfer learning enables zero-shot learning and removes extensive data collection and training per model, resulting in dramatically simplified product development, reduced time-to-market, and more accurate voice models compared to the traditional methods relying on data gathering. Hardware-aware training optimizes on-device engines and models for the target platform, resulting in resource and power-efficient models even for stringent power consumption requirements.
Picovoice offers several types of support options:
- Enterprise Plan Customers: Can customize the level of support to fit the unique needs of their organization.
- Foundation Plan Customers: Get six (6) hours of email support with a 3-day SLA.
- Enterprise Prospects: Can get dedicated support by booking a meeting with the Product and Engineering team.
- Free Plan Account Owners: Can create GitHub issues to report bugs.
Free Plan is for personal and non-commercial projects only. Any commercial project requires a paid plan. Examples of commercial use include client projects, MVPs, internal testing, and evaluation, which involve founders, employees, contractors, or consultants writing code.
Picovoice offers a Free Trial for enterprise developers. No credit card is required. You can sign up at this link.
Visit the dashboard or your profile page on Picovoice Console.
Usage tracking depends on the engine:
- Audio processed (per second): Cheetah, Leopard, Koala, Eagle, Falcon
- Text data (per character): Orca
- Tokens (per token): picoLLM Inference
- Monthly active users: Porcupine, Rhino, Cobra
A "user" refers to things that activate engines. It is not necessarily an account owner or end-user.
Usage resets every 30 days. You can view real-time consumption on your Picovoice Console Profile.
Once you download a model, it's counted toward your monthly model download usage.
Model download usage resets every 30 days. You can view your usage on your Picovoice Console Profile.
No, you cannot reset your AccessKey. Do not share it with third parties.
No, usage is reset automatically every 30 days.
No, creating multiple accounts violates the Terms of Use. You may be asked to pay fees or have access terminated. To get higher usage, upgrade to a paid plan.
No, the Free Trial is a one-time offer. No credit card is required. Make sure to upgrade before the trial ends.
No, the Free Trial is a one-time offer. If your organization wants extended use, you must upgrade.
Most answers are available on the Picovoice website. For additional help:
On-device voice AI platform offers everything that developers need to design, develop, and ship voice products: a complete set of modular voice AI engines delivered as cross-platform SDKs and a no-code platform to instantly train bespoke voice AI models to boost accuracy and efficiency.
We recommend Cheetah Streaming Speech-to-Text for real-time conversations such as live events, conferences, and meetings, or enable note-taking and voice typing.
Please note that every use case is unique, and the nuances may affect the performance of your product. If you're a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you are not a customer yet, you can purchase Enterprise Support to discuss your use case and get your technical questions answered by the experts.
We recommend Leopard Speech-to-Text to convert audio and video files, such as recordings of interviews, meetings, or calls, podcasts, and voicemails, into text.
Please note that every use case is unique, and the nuances may affect the performance of your product. If you're a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you are not a customer yet, you can purchase Enterprise Support to discuss your use case and get your technical questions answered by the experts.
We recommend Koala Noise Suppression to achieve crisp and clear conversations by removing background noise and enhancing speech.
Please note that every use case is unique, and the nuances may affect the performance of your product. If you're a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you are not a customer yet, you can purchase Enterprise Support to discuss your use case and get your technical questions answered by the experts.
We recommend Falcon Speaker Diarization for speaker diarization to make transcripts readable and analyzable.
Please note that every use case is unique, and the nuances may affect the performance of your product. If you're a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you are not a customer yet, you can purchase Enterprise Support to discuss your use case and get your technical questions answered by the experts.
We recommend Eagle Speaker Recognition to identify and verify speakers and personalize experiences simply by recognizing the user's voice.
Please note that every use case is unique, and the nuances may affect the performance of your product. If you're a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you are not a customer yet, you can purchase Enterprise Support to discuss your use case and get your technical questions answered by the experts.
We recommend Orca Streaming Text-to-Speech to convert written text into spoken audio output.
Please note that every use case is unique, and the nuances may affect the performance of your product. If you're a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you are not a customer yet, you can purchase Enterprise Support to discuss your use case and get your technical questions answered by the experts.
We recommend picoLLM On-device LLM Platform to convert streaming LLM text output into voice.
Please note that every use case is unique, and the nuances may affect the performance of your product. If you're a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you are not a customer yet, you can purchase Enterprise Support to discuss your use case and get your technical questions answered by the experts.
We recommend Porcupine Wake Word to detect wake words (Alexa), always listening commands (turn the lights on), and monitor conversations for specific keywords (product name).
Please note that every use case is unique, and the nuances may affect the performance of your product. If you're a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you are not a customer yet, you can purchase Enterprise Support to discuss your use case and get your technical questions answered by the experts.
We recommend Cobra Voice Activity Detection to detect when someone starts or stops speaking and trigger action accordingly.
Please note that every use case is unique, and the nuances may affect the performance of your product. If you're a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you are not a customer yet, you can purchase Enterprise Support to discuss your use case and get your technical questions answered by the experts.
We recommend Cobra Voice Activity Detection to detect and clean silence in audio and video data.
Please note that every use case is unique, and the nuances may affect the performance of your product. If you're a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you are not a customer yet, you can purchase Enterprise Support to discuss your use case and get your technical questions answered by the experts.
We recommend Picovoice Voice Recorders to record and process audio files to create audio streams and use Picovoice Voice AI engines.
Please note that every use case is unique, and the nuances may affect the performance of your product. If you're a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you are not a customer yet, you can purchase Enterprise Support to discuss your use case and get your technical questions answered by the experts.
You can download the quantized open-weight, publicly available Llama, Mistral, Mixtral, Phi, and Gemma models compressed by picoLLM Compression from Picovoice Console. For use case specific, custom LLM quantization requests, please reach out to your Picovoice contact to work with large language model experts who developed Picovoice's novel large language model (LLM) quantization algorithm, picoLLM.
picoLLM comes with an inference engine that runs X-bit quantized LLMs. picoLLM inference engine:
- runs on-device LLMs across Linux, macOS, Windows, Android, iOS, Raspberry Pi, Chrome, Safari, Edge, and Firefox.
- supports CPU and GPU out-of-the-box and has the architecture to tap into other forms of accelerated computing.
- works with any LLM architecture.
Yes, picoLLM offers quantized Llama models to run locally on-device for free. Quantized Llama language models can be downloaded from Picovoice Console and deployed locally across platforms within your plan limits.
Yes, picoLLM offers quantized Mistral models to run locally on-device for free. Quantized Mistral language models can be downloaded from Picovoice Console and deployed locally across platforms within your plan limits.
Yes, picoLLM offers quantized Microsoft Phi models to run locally on-device for free. Quantized Microsoft Phi language models can be downloaded from Picovoice Console and deployed locally across platforms within your plan limits.
Yes, picoLLM offers quantized Gemma models to run locally on-device for free. Quantized Gemma models can be downloaded from Picovoice Console and deployed locally across platforms within your plan limits.
- Desktop & Server: Linux, Windows & macOS
- Mobile: Android & iOS
- Web Browsers: Chrome, Safari, Edge and Firefox
- Single Board Computers: Raspberry Pi
- Cloud Providers: AWS, Azure, Google, IBM, Oracle, and others.
Yes. You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, LLM Inference) in the cloud.
Yes. You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, LLM Inference) on-prem.
Yes. You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, LLM Inference) in the serverless.
Yes. You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, LLM Inference) on mobile devices.
Yes. You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, LLM Inference) within web browsers.
Yes. You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, LLM Inference) on embedded devices.
No. Picovoice voice AI engines do not require a GPU. However, you can run picoLLM Inference on a GPU for better performance.
The Picovoice on-device Voice AI platform supports a wide range of modern SDKs, including Android, C, .NET, Flutter, iOS, Java, Node.js, Python, React, React Native, and Web. For details on available SDKs for each engine, please refer to the respective platform or documentation page.
If your preferred SDK isn't currently supported, Picovoice Consulting can develop and maintain it for you as part of the Enterprise Plan offering.
Picovoice voice AI SDKs, voice recorders, and benchmarks are open-source and free to use.
To enable data-driven decision-making and communicate its engines' accuracy, Picovoice publishes open-source benchmarks for each engine. You can reproduce them or run them with your data.
We compared the picoLLM Compression algorithm accuracy against popular quantization techniques. Ceteris paribus - at a given size and model - picoLLM offers better accuracy than the popular quantization techniques, such as AWQ, GPTQ,LLM.int8(), and SqueezeLLM. You can check the open-source compression benchmark to compare the performance of picoLLM Compression against GPTQ.
Please note that there is no single widely used framework to evaluate LLM accuracy, as LLMs are relatively new and capable of performing various tasks. One metric can be more important for a certain task, and irrelevant to others. Taking "accuracy" metrics at face value and comparing two figures calculated in different settings may lead to wrong conclusions.
Also, picoLLM Compression's value add is retaining the original quality while making LLMs available across platforms, i.e., offering the most efficient models without sacrificing accuracy, not offering the most accurate model.
We highly encourage enterprises to compare the accuracy against the original models, e.g., llama-2 70B vs. pico.llama-2 70B at different sizes.
The secret sauce of the success behind Picovoice's super lightweight and accurate models is end-to-end optimization. Most edge voice AI models use post-training optimization of pre-trained models. Since these models were not designed for edge deployment in the first place, potential optimizations are restricted.
Furthermore, they depend on open-source runtimes like PyTorch or TensorFlow, which again restrict performance improvements. As a result, achieving cloud-level accuracy on the edge remains a challenge.
By owning the entire data pipeline and training process, Picovoice enables full end-to-end optimization. Furthermore, Picovoice researchers continuously improve techniques and frameworks used to train algorithms. Picovoice applies transfer learning, hardware-aware training, and neural compression principles, resulting in efficient models competing with cloud-dependent AI models.
It depends on your tech stack and design. Given the number of engines Picovoice offers and the platforms it supports, it's hard to communicate one number. We encourage developers to do their own tests and evaluations in their real environments.
The smaller the models and more powerful the systems are, the faster language models run.
Speed tests (token/second) are generally done in a controlled environment and, unsurprisingly, in favor of the model/vendor. Several factors, hardware (GPU, CPU, RAM, motherboard, original size of the models) and software (background processes and programs), language model, and so on affect the speed.
At Picovoice, our communication has always been fact-based and scientific. Since speed tests are easy to manipulate and it's impossible to create a reproducible framework we cannot publish any metrics. We strongly suggest everyone run their own tests in their environment.
Picovoice on-device voice AI models currently support: English, French, German, Italian, Japanese, Korean, Chinese, Portuguese, and Spanish. Please check the product page if you're looking for engine-specific information. If you have an opportunity requiring another language, engage with Picovoice Consulting to get a custom model trained for you!
Yes, Picovoice technology works well across accents and dialects. The best way to learn about it is to test Picovoice technology with your dataset. Picovoice offers a Free Trial that allows enterprises to evaluate and become familiar with the technology before committing to a paid plan.
Picovoice engines expect audio with a 16kHz sampling rate. PSTN networks usually sample at 8kHz. It is possible to upsample, but the frequency content above 4kHz is gone, and performance will be suboptimal.
It is possible to train acoustic models for telephony applications for enterprise customers. Engage with Picovoice Consulting to find the best solution that works for you.
Picovoice software expects a 16kHz sampling rate. You will need to downsample. Typically, operating systems or sound cards (Audio codecs) provide such functionality; otherwise, you will need to implement it.
Picovoice software expects a 16kHz sampling rate, as it strikes a balance between quality and file size, used in voice commands and speech recognition technologies.
At 16kHz, audio files are small enough to store and transmit while offering reasonable audio quality. Secondly, the human voice's most critical frequencies lie between 300Hz and 3400Hz. The Nyquist-Shannon sampling theorem states that a sampling rate of at least twice the highest frequency is required for accurate signal representation. 16kHz is more than twice 3400Hz and sufficient for processing the human voice. That's why 16kHz has become a standard in applications using human speech and voice.
There are several factors that affect the performance of voice AI engines: quality of audio data, environment - noise, echo, reverberation, tech stack, and design.
There are several advantages of running quantized models:
- Reduced Model Size: Quantization decreases the model size of large language models, resulting in:
- Smaller download size: Quantized LLMs require less time and bandwidth to download. For example, a mobile app using a large model may not be approved to be on the App Store.
- Smaller storage size: Quantized LLMs occupy less storage space. For example, an Android app using a small language model will take up less storage space, improving the usability of your application and the experience of users.
- Less memory usage: Quantized LLMs use less RAM, which speeds up LLM inference and your application and frees up memory for other parts of your application to use, resulting in better performance and stability.
- Reduced Latency: Compute latency and network latency consist of the total latency.
- Reduced Compute Latency: Compute latency is the time between a machine receiving a request and the moment and returning a response. LLMs require powerful infrastructure to run with minimal compute latency. Otherwise, it may take minutes, even hours, or days to respond. Reduced computational requirements allow quantized LLMs to respond faster given the same resources (reduces latency) or to achieve the same latency using fewer resources.
- Zero Network Latency: Network latency, delay, or lag shows the time that data takes to transfer across the network. Since quantized LLMs can run where the data is generated rather than requiring data to be sent to a 3rd party cloud, there is no need for the data transfer, hence zero network latency.
Quantization can be used to reduce the size of models and latency, potentially at the expense of some accuracy. Choosing the right quantized model is important to ensure small to no accuracy loss. Our Deep Learning Researchers explain why picoLLM Compression is different from other quantization techniques.
Quantization techniques, such as AWQ, GPTQ, LLM.int8(), and SqueezeLLM are developed by researchers for research. picoLLM is developed by researchers for production to enable enterprise-grade applications.
At any given size, picoLLM retains more of the original quality. In other words, picoLLM compresses models more efficiently than the others, offering efficient models without sacrificing accuracy compared to these techniques.
Read more from our deep learning research team about our approach to LLM quantization.
picoLLM Inference is specifically developed for the picoLLM platform.
Existing inference engines can handle models with known bit distribution (4 or 8-bit) across model weights. picoLLM-compressed weight contains 1, 2, 3, 4, 5, 6, 7, and 8-bit quantized parameters to retain intelligence while minimizing the model size. Hence existing inference engines built for pre-defined bit distribution are not able to match the dynamic nature of picoLLM.
Read more from our engineering team who explained why and how we developed picoLLM Inference engine.
There are three major issues with the existing LLM inference engines.
- They are not versatile. They only support certain platforms or model types.
- They are not ready-to-use, requiring machine learning knowledge.
- They cannot handle X-bit quantization, as this innovative approach is unique to picoLLM Compression.
HuggingFace transformers work with transformers only. TensorFlow Serving works with TensorFlow models only and has a steep learning curve to get started. TorchServe is designed for Pytorch and integrates well with AWS. NVIDIA Triton Inference Server is designed for NVIDIA GPUs only. OpenVINO is optimized for Intel hardware.
In reality, your software can and will be run on different platforms. That's why we had to develop picoLLM Inference. It's the only ready-to-use and hardware-agnostic engine.
You can leverage the self-service Picovoice Console to fine-tune voice AI models or engage with Picovoice Consulting for further improvement.
See how to fine-tune models on the Picovoice Console:
Custom speech recognition models are created for specific tasks, specific use cases, and sometimes for specific environments. General-purpose models are jacks-of-all-trades and masters-of-none.
For example, if you need a medical dictation app, you need a fine-tuned speech-to-text to be able to capture the jargon correctly. If you're building a sales enablement app, just like you train your salesforce to learn about your product names, you should adapt the general speech recognition model accordingly.
At the moment, custom language model training is available through picoLLM GYM for selected enterprise customers. Please engage with your account manager if you're already a Picovoice customer. If you're not a customer, become one!
Custom LLMs are created for specific tasks and specific use cases. General-purpose large language models are jacks-of-all-trades and masters-of-none. In other words, they can help a student with their homework, but not a knowledge worker with company-specific information.
General-purpose LLMs are offered by foundation model providers, such as OpenAI, Google, Meta, Microsoft, Cohere, Anthropic, Mistral, Databricks, and so on. They're good at developing products such as chatbots, translation services, and content creation apps. Developers building hobby projects, one-size-fits-all applications, or with no access to training datasets, can choose general-purpose LLMs.
Custom LLMs can offer distinctive feature sets and increased domain expertise, resulting in unmatched precision and relevance. Hence, custom LLMs have become popular in enterprise applications in several industries, including healthcare, law, and finance. They're used in various applications, such as medical diagnosis, legal document analysis, and financial risk assessment. Unlike general-purpose LLMs, custom LLMs are not ready to use; they require special training that leverages domain-specific data to perform better in certain use cases.
If you think they're a better fit, you should. Especially in the beginning, to have an understanding of what LLMs can achieve, using an API can be a better approach, as control over data, model, infrastructure, or inference cost is a concern. Closed-source model drawbacks become a concern when enterprises want to have control over their specific use case. If customizability, privacy, ownership, reliability, or inference cost at scale is a concern, then you should be more cautious about choosing a closed-source model.
- Customizability: Each vendor has different criteria and processes to develop custom models. In order to send an inquiry to OpenAI, one has to acknowledge that it may take months to train custom models, and pricing starts at $2-3million.
- Privacy: The default business model for closed-source models is to run inference in the cloud. Hence, it requires enterprises to send their user data and confidential information to the cloud.
- Ownership: You never have ownership of a closed-source model. If your LLM is critical for the success of your product, or in other words, if you view your LLM as an asset rather than a simple tool, it should be owned and controlled by you.
- Reliability: You are at the mercy of closed-source model providers. When their API goes down or has an increase in traffic, the performance of your software, hence user experience and productivity, is negatively affected.
- Cost at scale: Cloud computing at scale is costly. That's why cloud repatriation has become popular among large enterprises. Large Language Model APIs are not different, if not more costly, given the size of the models. If your growth estimation involves high-volume inference, do your math carefully.
Picovoice Consulting works with Enterprise Plan customers to compress their custom or fine-tuned LLMs using the picoLLM inference engine.
Enterprise Plan customers can engage with Picovoice Consulting to discuss custom development needs.
Picovoice voice AI engines support the most popular and widely-used hardware and software out-of-the-box - from web, mobile, desktop, and on-prem to private cloud. However, there are so many platforms, yet only so much time and money, making it impossible to support everything.
You can engage with Picovoice Consulting and get any Picovoice voice AI engine ported to the platform of your choice once you become an Enterprise Plan customer.
Picovoice supports the most popular and widely used SDKs. If you need another SDK, you can check our open-source SDKs and build it yourself, or contact Picovoice Consulting once you become an Enterprise Plan customer. Picovoice Consulting experts can create a public or private library for the SDK of your choice and maintain it.
Picovoice engines have hundreds of thousands of words in their lexicons. However, there might be some special words we missed. You can add a custom word to Leopard Speech-to-Text and Cheetah Streaming Speech-to-Text on the self-service Picovoice Console. For Porcupine Wake Word and Rhino Speech-to-Intent, Enterprise Plan customers can engage Picovoice Consulting.
You can create a GitHub issue under the relevant repository/demo.
Enterprises face several challenges while building PoCs. Finding talented and experienced individuals in machine learning is one of the biggest challenges to start with. We learned this the hard way, and experience it every day. On top of it, executives and clients may have unrealistic deadlines.
Experts at Picovoice Consulting help enterprises build PoCs, develop their AI strategy, and work with them hand-in-hand, offering the guidance they need.
Picovoice on-device AI engines process data in your environment, whether it's public or private cloud, on-prem, web, mobile, desktop, or embedded.
Picovoice is private by design and has no access to user data. Thus, Picovoice doesn't retain user data as it never tracks or stores it in the first place.
Yes. Enterprises using Picovoice don't need to share their user data with Picovoice or any other 3rd-party to run voice AI models, making Picovoice on-device voice AI platform intrinsically HIPAA-compliant.
Yes. Enterprises using Picovoice don't need to share their user data with Picovoice or any other 3rd-party to run voice AI models, making Picovoice on-device voice AI platform intrinsically GDPR-compliant.
Yes. Enterprises using Picovoice don't need to share their user data with Picovoice or any other 3rd-party to run voice AI models, making Picovoice on-device voice AI platform intrinsically CCPA-compliant.
Yes, you can use voice AI with local LLMs and create private, accurate, and reliable AI agents. Check Picovoice Blog or GitHub to find more information, tutorials, and demos. Some examples are:
The answer is "it depends". Voice AI is complex technology, and building products for production requires diligent work. It depends on your use case, other tools, and the tech stack used, along with hardware and software choices. Given the variables, it can be challenging.
You can experiment with different scenarios leveraging Picovoice's Free resources or engage with experts from Picovoice Consulting to find the best approach to deploying language models for production.
Yes! Picovoice engines are modular and work with other Picovoice products or competitive products. Check Picovoice Blog or GitHub to find more information, tutorials, and demos. The examples below use Porcupine Wake Word, Cheetah Streaming Speech-to-Text, picoLLM, and Orca Streaming Text-to-Speech together:
Enterprises face several challenges while building PoCs. Finding talented and experienced individuals in machine learning is one of the biggest challenges to start with. We learned this the hard way, and experience it every day. On top of it, executives and clients may have unrealistic deadlines.
Experts at Picovoice Consulting help enterprises build PoCs, develop their AI strategy, and work with them hand-in-hand, offering the guidance they need.