Picovoice AI Frequently Asked Questions
Find answers to frequently asked questions on Picovoice Technology, Picovoice Console Access and Pricing. Don’t forget to check out platform FAQs: Leopard (Speech-to-Text Engine), Cheetah (Real-time Speech-to-Text Engine), Octopus (Speech-to-Index Engine), Porcupine (Wake Word Engine), Rhino (Speech-to-Intent Engine), Picovoice Platform, and Cobra (Voice Activity Detection).
What’s Picovoice's business model?
Picovoice sells its proprietary voice recognition technology to enable organizations to build voice interfaces, transcription and search engines on a subscription basis. Picovoice software is priced based on consumption tiers: the number of active users or second-based consumption depending on the package. More information can be found on the pricing page or Picovoice Console Access and Pricing section below.
What are the benefits of subscription based model over the legacy perpetual licensing?
- Offers access to support, updates and upgrades during the engagement. Enterprises enjoy the advances in voice recognition and other software such as operating systems without worrying about the support and being stuck with old technology
- Lets enterprises pay only when voice features are used rather than deployed
- Helps enterprises manage their working capital effectively over years instead of an upfront lump sum payment
- Gives developers and enterprises flexibility to test Picovoice technology under the Free Tier even without engaging with Picovoice
- It automizes tracking, instead of manual reporting, resulting in efficiency gains
How does Picovoice process voice data?
Picovoice processes voice data locally on device without sending data to a 3rd party cloud. Therefore voice experiences built with Picovoice are not jeopardized by unpredictable network delays due to poor connectivity or latency. Picovoice offers full privacy and cost-effectiveness at scale by eliminating the cloud-related costs, including hidden ones.
Does Picovoice work with everyone’s voice (universal) or does it only work with my voice (personal)?
Picovoice engines are universal and trained in real-world environments to work with a variety of accents and people’s voices.
How does Picovoice train voice AI models?
Picovoice’s unique approach to speech recognition offers affordable and accessible voice AI to anyone and any device, resulting in an immersive experience. Picovoice proprietary algorithm is developed based on the principles of transfer learning and hardware-aware training. Transfer learning enables zero-shot learning and removes extensive data collection and training per model, resulting in dramatically simplified product development, reduced time-to-market and more accurate voice models compared to the traditional methods relying on data gathering. Hardware-aware training optimizes voice models for the target platform, resulting in resource and power-efficient models even for stringent power consumption requirements.
Which use cases do you support?
Picovoice empowers enterprises from various industries with different use cases. Even organizations from the same industry apply Picovoice technology to solve different problems. You can check Voice Search, Voice Command and Control, Search by Voice and Speech Analytics use case pages, or read our blog posts. You can think of Picovoice as lego bricks, what you will do with them depends on your business requirements or imagination.
Should I provide recordings to train voice AI models?
The short answer is no. Picovoice’s unique approach to speech recognition differentiates itself from other vendors. It doesn’t require data gathering to train models. Voice AI models can be instantly trained on the Picovoice Console ’s type-and-train user interface and downloaded immediately.
How do you collect user data?
Picovoice does not track, store or collect user data.
How do I get technical support while developing my voice product?
Picovoice provides GitHub community support for the Free Tier, email support for the Starter Tier customers, and dedicated support for the Enterprise Tier. You can find answers to your questions on Picovoice docs, GitHub , and blog, if not feel free to create a GitHub issue under the relevant repo.
Which Picovoice engine should I use?
Developers face myriad choices when building voice products. If you need to
- Perform batch audio transcription, such as post-interview transcription, podcasts analysis or social media listening, then you should use Leopard Speech-to-Text (for file-based)
- Require real-time transcription, such as meetings, note-taking, and voice typing, then you should use Cheetah Speech-to-Text (streaming transcription) Speech-to-Text engine).
- Search keywords or phrases within a large body of voice data, such as meetings, lectures, or call center recordings, then you should use Octopus Speech-to-Index
- Recognize a single phrase or several predefined phrases, in an always-listening fashion, then you should use Porcupine Wake Word
- Recognize complex voice commands within a confined and well-defined domain with a limited number of vocabulary and variations of spoken forms, then you should use Rhino Speech-to-Intent
Detect the presence of human speech to trigger further actions, such as telemarketing calls to transfer to an agent as a prospect picks up to improve productivity or transcription applications for people with hearing problems to trigger an STT, then you should use Cobra Voice Activity Detection. You can read our strategy guide on how to select the best voice technology or Voice Search, Voice Command and Control, Search by Voice and Speech Analytics use case pages.
Which languages does Picovoice support?
Picovoice currently supports eight languages: English, German, French, Italian, Japanese, Korean, Portuguese and Spanish to build voice interfaces. Please check platform pages for more information. If you have a business opportunity requiring another language, contact sales and tell us more.
Does Picovoice technology work across various accents and dialects?
Yes, it works well across various accents and dialects. However, Picovoice recommends you try the engines of your interest and preferably evaluate with an accented dataset of your choice to see if it meets your requirements. You can sign up to the Picovoice Console with the Free Tier to test in your target environment and with your target users. You can also check open-source benchmark results for the engine of your interest: speech-to-text benchmark, voice search benchmark, wake word benchmark, natural language understanding benchmark and voice activity detection benchmark.
Does Picovoice software work in presence of noise and reverberation?
The short answer is yes. Picovoice software is designed to function robustly in presence of noise and reverberations. We have benchmarked and published the performance results under various noisy conditions. You can check open-source benchmark results for the engine of your interest: Leopard, Octopus, Porcupine, Rhino, and Cobra. The end-to-end performance depends on the type and amount of noise and reverberation. We highly recommend testing out the software using freely-available models in your target environment and application.
Does Picovoice technology work in far-field applications?
Most likely. However, the performance of Picovoice technology depends on many factors including the distance, ambient noise level, reverberation (echo), quality of the microphone, and audio frontend used (if any). It is recommended to try out our technology using the freely-available sample models in your environment. Additionally, we often publish open-source benchmarks of our technology in noisy environments. If the target environment is noisy and/or reverberant and the user is a few meters away from the microphone, a multi-microphone audio frontend can be beneficial. You can check open-source benchmark results for the engine of your interest: speech-to-text benchmark, voice search benchmark, wake word benchmark, natural language understanding benchmark and voice activity detection benchmark.
Can I use Picovoice software for telephony applications?
Picovoice engines expect audio with a 16kHz sampling rate. PSTN networks usually sample at 8kHz. It is possible to upsample but then the frequency content above 4kHz is gone and performance will be suboptimal. It is possible to train acoustic models for telephony applications for enterprise customers. Contact sales and tell us more about your business requirements.
My audio source is 48kHz/44.1kHz. Does Picovoice software support that?
Picovoice software expects a 16kHz sampling rate. You will need to downsample. Typically, operating systems or sound cards (Audio codecs) provide such functionality; otherwise, you will need to implement it.
Which SDKs are available to build my voice products?
The full list of Picovoice SDKs and tutorials can be found on docs, and demos can be found on Picovoice Blog or Medium Page .
Can I use Picovoice technology for voice recognition on small devices like Raspberry Pi?
Yes! You can run even speech-to-text locally on Raspberry Pi and process voice data offline. The full list of platforms, Picovoice supports and tutorials for these platforms can be found on docs.
Can I use Picovoice technology for voice recognition within web browsers?
Yes! Check out Picovoice platform pages for Leopard Speech-to-Text, Octopus Speech-to-Index, Porcupine Wake Word, Rhino Speech-to-Intent and Cobra Voice Activity Detection and try demos. Each demo runs within your web browser without sending the voice data to any 3rd parties.
Does Picovoice offer AEC, noise suppression, or microphone array beamforming?
No. But we do have partners who provide such algorithms. Please add this to your enquiry when reaching out and we can connect you.
How do I train voice AI models?
Voice models are trained on the Picovoice Console . Picovoice Console has a web-based type-and-train interface. It means you type the intents, expressions and slots similar to typing on a document, or add custom vocabulary and boost the accuracy of the words without using a single line of code and click on the train button. Picovoice Console trains platform-optimized AI models instantly and makes them available to download. Before downloading, You can test the created models within your web browser, again no coding is required. You can create an account for the Picovoice Console account immediately and start building without engaging with the Picovoice team.
How many models can I train?
Free Tier users can train 3 (three) custom branded wake words with Porcupine, 10 (ten) contexts with Rhino ( Speech-to-Intent Engine) and 100 Speech-to-Text (Leopard or Cheetah) models every 30 days. Starter Tier users can train 10 (ten) custom branded wake words with Porcupine, 100 (one hundred) contexts with Rhino (Speech-to-Intent Engine) every 30 days. Enterprise customers can increase the number of model allowances depending on their business requirements. Trained models do not expire, every 30 days users get additional allowances.Can I use only one of your products?
Yes. We bundle our products Porcupine, Rhino and Cobra - all you need to create voice experiences - and Cheetah, Leopard and Octopus - all you need for transcription and voice search. Pricing for creating voice experiences is based on the number of users that you want to enable. Using all three or only one is up to you. Same with transcription and voice search needs, you get access to all three engines, deciding whether you’ll transcribe audio inputs real-time or index files to find phrases within audio files is up to you. Picovoice offers best-in-class technology throughout your journey to support you in building the best voice product for your users. Which engines you will use to offer the best product to your users is your call. Check the pricing page for more information.
I need a little bit more than what you offer with the Free Tier. Can you increase my Free Tier allowances?
No. Both the Free Tier and Starter Tier user allowances are clearly and transparently communicated on the pricing page. We treat all Picovoice Console accounts equally and offer the same allowances to all Free Tier users to be fair and transparent.
How do you track the number of monthly active users for the Voice Assistant Package that includes Porcupine, Rhino and Cobra?
First and foremost, Picovoice doesn’t track, record or store anything related to individuals. For Picovoice, an active user is a "thing" that activates and uses Picovoice engines. Since Picovoice doesn’t track anything related to humans, although different people activate Picovoice engines through the same user ("thing"); it is counted as one user. Unlimited voice interactions are offered for every user (“thing”). If one person activates Picovoice engines through different things, the new things will be new users and counted as new users.
Picovoice starts tracking only when engines are activated, not when they’re deployed. If a user never activates the engine, it’s never counted as a user. If a user activates engines and stops using them, after 30 days that user will no longer be counted. Account allowances and real-time usage are shown on the Picovoice Console profile.
If your product is
- Hardware device such as a smart speaker, elevator, camera, or kiosk every device is counted as a user, regardless of the volume of voice data or number of individuals.
- Mobile application, every install ID is counted as a user, regardless of the volume of voice data or number of individuals. [When a mobile application is removed and reinstalled, as they’re downloaded with a new install ID, it’s counted as a new user. If an application is updated without being deleted, since there’s no new install ID, it’s not counted as a new user.]
- Web application, every domain from the same browser is counted as one user, regardless of the volume of voice data or number of individuals. [When a web application is accessed through a different browser, or incognito window from the same device, it’s counted as a new user.]
When hardware or software settings are reset, they will be counted as a new device.
How do you track the consumption for the Transcription & Search package that includes Leopard, Cheetah and Octopus?
Consumption is measured by seconds and the length of the files transcribed or indexed. An instance of transcribing a 30-second-long video in real-time or transcribing or indexing a 30-second-long audio recording is counted as 30 seconds. Picovoice doesn’t track the transfer of transcribed text files or indexed files among different machines. One account owner can transcribe and/or index audio files on multiple machines, in other words, there is no user limit. Account allowances and real-time usage are shown on the Picovoice Console Profile.
Can I reset my allowances without waiting for 30 days?
No, you cannot reset your allowances. You shall wait until your allowances reset.
Can I reset my AccessKey on Picovoice Console?
No, your AccessKey cannot be reset, please do not share it with 3rd parties.
Can you reset my allowances?
We do not change Free Tier user allowances, that includes increasing, decreasing or resetting the allowances to be fair to all Picovoice Console users.