Mobile apps are an ideal use case for
Speech Recognition, whether it be for hands-free diction, voice interfaces for mobile games, or generating subtitles for video and audio messages.
Apple devices, such as the iPhone, iPad and Apple Watch are powered by
iOS, Apple's popular flagship operating system.
iOS features it's own
Speech Recognition API, but it can be clumsy and verbose to integrate. Crucially, not all languages it supports have
on-device recognition and even those that do may choose to stream audio to Apple's servers, introducing privacy concerns and latency.
Speech-to-Text technology does not have these downsides, and integrates seamlessly into the
In addition to
Speech-to-Text engines are compatible in a wide array of environments, such as
Windows, and modern
web browsers (via WebAssembly).
With Speech-to-Text transcription, there are two main approaches:
Real-time Speech-to-Text systems offer text output in real time as a user speaks, mirroring how humans listen and convert speech into text mentally during conversations. A downside to this method is that it can lead to errors arising from auditory or semantic difficulties, which often only become apparent after a sentence is finished. Therefore, it's crucial to take this drawback into account when determining if an application necessitates
Online Automatic Speech Recognition, and
Streaming Speech-to-Text all refer to the same core technology.
iOS devices, Picovoice provides Cheetah Streaming Speech-to-Text, a unique technology that performs all voice recognition in
real-time directly on the device. This approach avoids network-related delays and minimizes the latency between the user's speech input and the transcription output.
Below is the list of software development kits (SDKs) supported by Cheetah, along with corresponding code snippets and quick-start guides.
o = pvcheetah.create(access_key)partial_transcript, is_endpoint =o.process(get_next_audio_frame())
Batch Speech-to-Text waits for the complete spoken phrase to complete before providing a transcription. Compared to
real-time approaches, this method boasts higher accuracy and runtime efficiency. It can anticipate spoken words, making adjustments for better precision in both linguistic and acoustic aspects. Additionally, it streamlines the process by eliminating the need to switch between listening and transcribing, thus improving overall efficiency.
For iOS-based devices, Picovoice offers Leopard Speech-to-Text, a state-of-the-art technology for
batch transcription tasks. Like Cheetah, Leopard processes all voice audio data on device, ensuring privacy by design and compliance with regulations such as HIPAA and GDPR. To further improve accuracy, users can incorporate custom vocabulary and boosting specific phrases via the Picovoice Console .
Below is the list of SDKs supported by Leopard, along with corresponding code snippets and quick-start guides.
o = pvleopard.create(access_key)transcript, words =o.process_file(path)