Products built with Speech-to-Text cloud APIs send voice data to a 3rd party cloud unless they run on-device or on-prem. It’s easy to lose control of the privacy of your data when the information gets sent to multiple enterprises. The situation is comparable to keeping a secret: it’s not a secret if more than one person knows about it. Your data is not secure if more than one enterprise has access to it.
Leaving the security and privacy of your data in the hands of a 3rd party is not ideal. Indeed, regulations have become stricter due to the unethical uses of consumer data. However, it does not mean you still have complete control over your data. Many Voice-AI vendors, such as IBM, use customer data to train their models (they are storing your data).
By default, all Watson services log requests and their results. (IBM)
Picovoice processes voice data on the device, meaning it has no access to voice data. Your information cannot be tracked, recorded, or stored. Enterprises can deploy Picovoice technology on mobile devices, websites, desktops, or servers, such that the data does not have to leave users’ devices. Thus, enterprises have control over the data and can decide who is allowed access. However, not every vendor offers on-device or on-prem speech recognition.
In the event that it is not possible to deploy on-device speech recognition, the Picovoice team has prepared a checklist of questions to review before sending data to a 3rd party cloud Voice AI provider. Ensure you are reading the fine print carefully and speaking to the vendor before integrating an API or uploading data.
Check out the audio file content before uploading
- Is there any PII (personally identifiable information)?
- Is there any confidential or restricted information?
It is important to protect confidential information and be knowledgeable of who has access to it.
When transferring audio files to vendors
- How secure is the connection?
- How does the platform receiving the upload ensure security? Is it certified?
The connection can be intercepted if not safe, putting the data at extra risk.
When receiving the transcript
- How is the transcript received? What is the authentication method?
- What is the encryption method, and how often is it updated?
Similar to when transferring audio files, the method of receiving the transcript should be secure to ensure data is not at risk of being intercepted.
Data access
- Who has access to your data (both the audio recording and the transcription) within the vendor and externally (i.e. partners)?
- What is the authentication method to access data?
Once the data has been uploaded to a 3rd party cloud Voice AI provider, verify that you know who has access (or potential access) to it.
Data storage
- How long does the vendor (and partners) keep the data (both the audio recording and the transcription), and where is the data stored?
- Is the stored data encrypted? If so, how is it encrypted?
- Why is the data being stored, and how is it used?
- What is the process of getting the data deleted?
Similar to data access, you should be aware of where, how, and why your data will be stored. This will ensure that you are cognizant about the entire data storage process, and any associated risks.
Vendor reliability
- Does the vendor have a conflict of interest? (i.e. does it have operations/aspirations to have business in the same verticals? For example, every cloud provider is investing in CCaS - would your data give them a competitive edge?)
- Did the vendor experience any previous vulnerabilities or leakages? How did they handle it?
- What are the consequences of unethical and illegal use of data?
Being informed about your vendor’s reputation and history can help you decide which 3rd party to upload your data to. Picking the right vendor can minimize the risks of data leaks.
Disclaimer: The information provided on this page is for general informational purposes only, and is not legal advice.
Picovoice Consulting team helps companies select and implement the right AI models for their use cases.
Consult an Expert