Porcupine (Wake Word Engine) FAQ

  • Wake Words
  • Picovoice Console

Which Picovoice speech product should I use?

If you need to recognize a single phrase or a number of predefined phrases (dozens or fewer), in an always-listening fashion, then you should use Porcupine (wake word engine). If you need to recognize complex voice commands within a confined and well-defined domain with limited number of vocabulary and variations of spoken forms (1000s or fewer), then you should use Rhino (speech-to-intent engine). If you need to transcribe free-form speech in an open domain, then you should use Cheetah (speech-to-text engine).

What are the benefits of implementing voice interfaces on-device, instead of using cloud services?

Privacy, minimal latency, improved reliability, runtime efficiency, and cost savings, to name a few. More detail is available here.

Does Picovoice technology work in far-field applications?

It depends on many factors including the distance, ambient noise level, reverberation (echo), quality of microphone, and audio frontend used (if any). It is recommended to try out our technology using the freely-available sample models in your environment. Additionally, we often publish open-source benchmarks of our technology in noisy environments 1 2 3. If the target environment is noisy and/or reverberant and the user is few meters away from the microphone, a multi-microphone audio frontend can be beneficial.

Does Picovoice software work in my target environment and noise conditions?

It depends on variety of factors. You should test it out yourself with the free samples made available on Picovoice GitHub pages. If it does not work, we can fine-tune it for your target environment.

Does Picovoice software work in presence of noise and reverberation?

Picovoice software is designed to function robustly in presence of noise and reverberations. We have benchmarked and published the performance results under various noisy conditions 1 2 3. The end-to-end performance depends on the type and amount of noise and reverberation. We highly recommend testing out the software using freely-available models in your target environment and application.

Can I use Picovoice software for telephony applications?

We expect audio with 16000Hz sampling rate. PSTN networks usually sample at 8000Hz. It is possible to upsample, but then the frequency content above 4000Hz is missing and performance will be suboptimal. It is possible to train acoustic models for telephony applications, if the commercial opportunity is justified.

My audio source is 48kHz/44.1KHz. Does Picovoice software support that?

Picovoice software expects a 16000Hz sampling rate. You will need to resample (downsample). Typically, operating systems or sound cards (Audio codecs) provide such functionality; otherwise, you will need to implement it.

Can Picovoice help with building my voice enabled product?

Our core business is software licensing. That being said, we do have a wide variety of expertise internally in voice, software, and hardware. We consider such requests on a case-by-case basis and assist clients who can guarantee a certain minimum licensing volume.

If I am using GitHub to evaluate the software, do you provide technical support?

Prior to commercial engagement, basic support solely pertaining to software issues or bugs is provided via GitHub issues by the open-source community or a member of our team. We do not offer any free support with integration or support with any platform (operating system or hardware) that is not officially supported via GitHub.

Why does Picovoice have GitHub repositories?

To facilitate performance evaluation, for commercial prospects, and to enable the open source community to use the technology for personal and non-commercial applications.

What is the engagement process?

You may use what is available on GitHub while respecting its governing license terms, without engaging with us. This facilitates initial performance evaluation. Subsequently, you may acquire a development license to get access to custom speech models or use the software for development and internal evaluation within a company; the development license is for building a proof-of-concept or prototype. When ready to commercialize your product, you need to acquire a commercial license.

Does Picovoice offer AEC, VAD, noise suppression, or microphone array beamforming?

No. But we do have partners who provide such algorithms. Please add this to your inquiry when reaching out and we can help to connect you.

Can you build a voice-enabled app for me?

We do not provide software development services, so most likely the answer is no. However, via a professional services agreement we can help with proofs-of-concept (these will typically be rudimentary apps focused on voice user interface or building the audio pipeline), evaluations on a specific domain/task, integration of SDKs in your app, training of custom acoustic and language models, and porting to custom hardware platforms.

How do I evaluate Porcupine software performance?

We have benchmarked the performance of Porcupine software rigorously and published the results here. We have also open-sourced the code and audio files used for benchmarking on the same repository to make it possible to reproduce the results. You can also use the code with your own audio files (noise sources collected from your target environment or utterances of your own wake word) to benchmark the performance. Additionally, we have made a set of sample wake words freely available on this GitHub repository on all platforms to facilitate evaluation, testing, and integration.

Can Porcupine wake word detection software detect non-English keywords?

It depends. If English speakers can easily pronounce the non-English wake word, then we can most likely generate it for you. We recommend sending us a few audio samples including the utterance of the requested wake word so that our engineering team can review and provide feedback on feasibility.

What is Porcupine’s wake word detection accuracy?

We have extensive benchmarking on Porcupine performance compared accuracy against alternatives, and published the result here. Porcupine can achieve 91%+ accuracy (detection rate) with less than 1 false alarm in 10 hours in the presence of ambient noise with 10dB SNR at microphone level.

Can Porcupine detect the wake word if the speaker is yelling/shouting in anger, excitement, or pain?

Porcupine does not have a profile to recognize emotionally-coloured utterances such as yelling, dragging, mumbling, etc. We do require the speaker to somewhat clearly vocalize the phrase.

Does Porcupine’s detection accuracy depend on the choice of wake word?

Generally speaking yes, however it is difficult to quantify the cause-and-effect accurately. We have published a guide here to help you pick a wake word that would achieve optimal performance. You will need to avoid using short phrases, and make sure your wake word includes diverse sounds and at least six phonemes. Long phrases are also not recommended due to the poor user experience.

Is there a guideline for picking a wake word?

We have published a guide here to help you pick a wake word that would achieve optimal performance.

How much CPU and memory does Picovoice wake word detection software consume?

We offer several trims for our wake word detection model. The standard model, which is recommended on most platforms, uses roughly 1.5MB of readonly memory (ROM / FLASH) and 5% of a single core on a Raspberry Pi 3.

What should I set the sensitivity value to?

You should pick a sensitivity parameter that suits your application requirements. A higher sensitivity value gives a lower miss rate at the expense of higher false alarm rate. If your application places tighter requirements on false alarms, but can tolerate misses, then you should lower the sensitivity value.

What is an ROC curve?

The accuracy of a binary classifier (any decision-making algorithm with a “yes” or “no” output) can be measured by two parameters: false rejection rate (FRR) and false acceptance rate (FAR). A wake word detector is a binary classifier. Hence, we use these metrics to benchmark it.

The detection threshold of binary classifiers can be tuned to balance FRR and FAR. A lower detection threshold yields higher sensitivity. A highly sensitive classifier has a high FAR and low FRR value (i.e. it accepts almost everything). A receiver operating characteristic (ROC) curve plots FRR values against corresponding FAR values for varying sensitivity values

To learn more about ROC curves and benchmarking a wake word detection, you may read the blog post here and Porcupine benchmark published here.

If I use Porcupine wake word detection in my mobile application, does it function when the app is running in the background?

Developers have been able to successfully run Porcupine wake word detection software on iOS and Android in background mode. However, this feature is controlled by the operating system, and we cannot guarantee that this will be possible in future releases of iOS or Android. Please check iOS and Android guidelines, technical documentation, and terms of service before choosing to run Porcupine wake word detection in the background. We recommend using the sample demo applications made available on this repository to test this capability in your end application before acquiring a development or commercial license.

Which platforms does Porcupine wake word detection support?

Porcupine wake word detection software is supported on Raspberry Pi (all models), BeagleBone, Android, iOS, Linux (x86_64), macOS, Windows, and modern web browsers (excluding Internet Explorer). Additionally, we have support for various ARM Cortex-A and ARM Cortex-M (M4/M7) MCUs by NXP and STMicro.

What is required to support additional languages?

Porcupine is architected to work with any language, and there are no technical limitations on supporting most languages. However, supporting a new language requires significant effort and investment. The undertaking is a business decision which depends on our current priorities, pipeline, and the scale of commercial opportunity for which the language support is required.

Does Porcupine wake word detection software work with everyone’s voice (universal) or does it only work with my voice (personal)?

Porcupine wake word detection software is universal and trained to work with a variety of accents and people’s voices.

Does Porcupine wake word detection work with children’s voices?

Porcupine may not work well with very young children as their voices are different from adult voices. We have made the software available for free evaluation with a set of sample wake words. We recommend that you test the engine with speech of children within your target age range before acquiring a development or commercial license.

Do users need to pause and remain silent before saying the wake word?

By default, no. But if that is a requirement, we can customize the software (as part of our professional services for you) to require silence either before or after the wake word.

If my wake phrase is made of two words (e.g., “Hey Siri”), does the software detect if the user inserts silence/pause in between each word?

By default, the engine ignores silence in between the words. However, if that is a requirement, we can customize the software (as part of our standard professional services) to require silence between each word.

Our marketing team is having difficulty deciding on the choice for wake word, can you help?

Yes, we can help you with the process of choosing the right wake word for your brand. We also offer the option for revision if you change your mind after the purchase of a development license.

Does Porcupine wake word detection work with accents?

Yes, it works generally well with accents. However, it’s impossible to objectively quantify it. We recommend you try the engine for yourself and perhaps evaluate with an accented dataset of your choice to see if it meets your requirements.

How does Picovoice wake word detection software work when UK and US wake word pronunciations sometimes differ?

For words that have different pronunciations in UK and US English, like “tomato”, we recommend listening for both pronunciations simultaneously with two separate wake word model files, each targeting a distinct pronunciation.

How many wake words can Porcupine detect simultaneously?

There is no technical limit on the number of wake words the software can listen to simultaneously.

How much additional memory and CPU is needed for detecting additional wake word or trigger phrases?

Listening to additional wake words does not increase the CPU usage. However it will require 1 KB of memory per additional wake word model.

Is the Picovoice “Alexa” wake word verified by Amazon?

Amazon Alexa Certification requirements are different for near, mid, and far-field applications (AVS, AMA, etc.). Also, the certification is typically performed on the end hardware, and the outcome depends on many design choices such as microphone, enclosure acoustics, audio front end, and wake word. Picovoice can assist with new product introduction (NPI) and Alexa certification under our technical support package.

Does Picovoice wake word detection software work with Google Assistant?

Yes. However, your product may have to go through a certification procedure with Google. Please check Google’s guidelines and terms of service for related information.

Can you use Picovoice wake word detection software with Cortana, IBM Watson, or Samsung Bixby?

Yes, Picovoice can generate any third-party wake words at your request. However, you are responsible for any necessary integration with such platforms and potential areas of compliance.

What’s the power consumption of Picovoice wake word detection engine?

The absolute power consumption (in wattage) depends on numerous factors such as processor architecture, vendor, fabrication technology, and system level power management design. If your design requires low power consumption in the (sub) milliwatt range for always-listening wake word detection, you will likely need to consider MCU (ARM Cortex-M) or DSP implementation.

Can Porcupine distinguish words with similar pronunciations?

The rigidity of rejecting words with similar pronunciations has several side effects such as rejecting accented pronunciations, as well as higher rejection rate in noisy conditions. By lowering the detection sensitivity you can achieve lower false acceptance of words with similar pronunciations at the cost of higher miss rate.

How can I run Picovoice software on my ARM-based MPU running a Yocto customized embedded Linux?

As part of our standard professional services, we can port our software to custom platforms for a one-time engineering fee and prepaid license royalties. We review these on a case-by-case basis and provide a quotation based on the complexity and type of the platform. Please note that the port must be performed in-house by our engineering team, since it requires direct access to our IP, proprietary technology, and toolchains. We would also require at least one development board running your target OS to perform this task.

What is your software licensing model?

The software published on this repository is available under Apache 2.0. If you need custom wake word models on a specific platform for commercial development (building PoC, prototyping, or product development) you need to acquire a development license. To install and use Picovoice software on commercial products with custom wake word models you need to acquire a commercial license. If you are developing a product within a company and working towards commercialization please reach out to us to acquire the appropriate license by filling out this form.

Can I use wake word models generated by the Picovoice Console in a commercial product?

The Picovoice Console and keyword files it generates can only be used for non-commercial and evaluation purposes. If you are developing a commercial product, you must acquire a development license. To acquire a development license fill out this form.

Table of Contents