Products built with speech-to-text cloud APIs send voice data to a 3rd party cloud unless they run on-device or on-prem. It’s easy to lose control when the number of parties increases. It’s like keeping a secret: It’s not a secret if more than one person knows. Your data is not secure if more than one enterprise has access.

Leaving the security and privacy of your data at a 3rd party’s mercy is not a good strategy. Regulations indeed became stricter due to the unethical use of data. However, it doesn’t mean you still have 100% control over your data. Many Voice-AI vendors, such as IBM, use customer data to train their models, i.e. store your data.

By default, all Watson services log requests and their results. (IBM)

Picovoice processes voice data on the device, which means it has no access to voice data. It cannot track, record or store voice data, hence user information. Enterprises can deploy Picovoice technology on mobile devices, websites, desktops or servers so that the data does not have to leave users’ devices. Thus, enterprises have control over the data and decide who can have access. However, not every vendor offers on-device or on-prem speech recognition.

Hence, we prepared a list of questions to ask before sending data to a 3rd party cloud voice AI provider. Read the fine print carefully and talk to the vendor before integrating an API or hitting the upload button.

Check out the audio file content:

  • Is there any PII (personally identifiable information)?
  • Is there any confidential or restricted information?

Transferring audio files to vendors:

  • How secure is the connection?
  • How does the platform receiving the upload ensure security? Is it certified?

Receiving Transcript:

  • How do you receive the transcript? What’s the authentication method?
  • What’s the encryption method, and how often is it updated?

Data Access:

  • Who has access to your data (both audio recording and transcription) within the vendor and externally (i.e. partners)?
  • What’s the authentication method to access data?

Data Storage:

  • How long does the vendor (and partners) keep the data (both audio recording and transcription), and where?
  • Is the stored data encrypted, and how?
  • Why is the data stored, and how is it used?
  • What is the process to get your data deleted?

Vendor Reliability:

  • Does the vendor have a conflict of interest? -e.g. does it have operations or aspirations to have business in the same verticals? For example, every cloud provider is investing in CCaS. Would your data give them a competitive edge?
  • Did the vendor experience any vulnerabilities or leakages before? How did they handle it?
  • What are the consequences of unethical and illegal use of data?

Disclaimer: The information provided on this page is for general informational purposes only, and is not legal advice.