Mobile apps are an ideal use case for Speech Recognition
, whether it be for hands-free diction, voice interfaces for mobile games, or generating subtitles for video and audio messages.
Apple devices, such as the iPhone, iPad and Apple Watch are powered by iOS
, Apple's popular flagship operating system. iOS
features it's own Speech Recognition
API, but it can be clumsy and verbose to integrate. Crucially, not all languages it supports have on-device
recognition and even those that do may choose to stream audio to Apple's servers, introducing privacy concerns and latency.
Fortunately, Picovoice's Speech-to-Text
technology does not have these downsides, and integrates seamlessly into the iOS
ecosystem.
In addition to iOS
, Picovoice's Speech-to-Text
engines are compatible in a wide array of environments, such as Android
, Linux
, macOS
, Windows
, and modern web browsers
(via WebAssembly).
With Speech-to-Text transcription, there are two main approaches: Real-Time
and Batch
.
Real-Time Speech-to-Text
Real-time Speech-to-Text
systems offer text output in real time as a user speaks, mirroring how humans listen and convert speech into text mentally during conversations. A downside to this method is that it can lead to errors arising from auditory or semantic difficulties, which often only become apparent after a sentence is finished. Therefore, it's crucial to take this drawback into account when determining if an application necessitates real-time
transcription.
Real-Time Speech-to-Text
, Online Automatic Speech Recognition
, and Streaming Speech-to-Text
all refer to the same core technology.
For iOS
devices, Picovoice provides Cheetah Streaming Speech-to-Text, a unique technology that performs all voice recognition in real-time
directly on the device. This approach avoids network-related delays and minimizes the latency between the user's speech input and the transcription output.
Below is the list of software development kits (SDKs) supported by Cheetah, along with corresponding code snippets and quick-start guides.
o = pvcheetah.create(access_key)partial_transcript, is_endpoint =o.process(get_next_audio_frame())
Batch Speech-to-Text
Unlike real-time
transcription, Batch Speech-to-Text
waits for the complete spoken phrase to complete before providing a transcription. Compared to real-time
approaches, this method boasts higher accuracy and runtime efficiency. It can anticipate spoken words, making adjustments for better precision in both linguistic and acoustic aspects. Additionally, it streamlines the process by eliminating the need to switch between listening and transcribing, thus improving overall efficiency.
For iOS-based devices, Picovoice offers Leopard Speech-to-Text, a state-of-the-art technology for batch
transcription tasks. Like Cheetah, Leopard processes all voice audio data on device, ensuring privacy by design and compliance with regulations such as HIPAA and GDPR. To further improve accuracy, users can incorporate custom vocabulary and boosting specific phrases via the Picovoice Console.
Below is the list of SDKs supported by Leopard, along with corresponding code snippets and quick-start guides.
o = pvleopard.create(access_key)transcript, words =o.process_file(path)