Speech to Text Transcription in React Native Tutorial

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI applications using React Native without sending user data to 3rd party servers.

React Native has emerged as a powerful framework for building cross-platform mobile applications, enabling developers to craft engaging user experiences that seamlessly run on both Android and iOS devices. As the demand for voice-enabled features continues to rise, integrating accurate and efficient Speech-to-Text technology has become a key challenge for many mobile developers. While a quick search on npm will yield a sizable list of plugins and libraries, it will soon become apparent that not all speech-to-text solutions are created equal.

As it stands, several speech-to-text solutions are available for React Native apps. Some rely on cloud-based processing, which comes with the drawback of potentially high network latency, privacy concerns, and recurring costs. Others require complex setup and a laundry-list of dependencies, making the development process cumbersome. Additionally, accuracy and performance can vary significantly between these solutions, impacting the user experience. To address these challenges, Picovoice offers two versatile options that process audio entirely on-device: Cheetah Streaming Speech-To-Text engine for real-time transcription and Leopard Speech-To-Text engine for batch processing.

Picovoice's Speech-to-Text engines are compatible with a wide array of environments, such as Android, iOS, Linux, macOS, Windows, and modern web browsers (via WebAssembly).

Real-Time Speech-to-Text

Real-time speech-to-text transcription offers the main benefit of providing immediate text output as the user speaks. This enables fluid and interactive communication in applications involving voice assistant or live captioning technology. It also mimics the natural flow of human conversation, enhancing user experience. However, it comes with some drawbacks. Real-time transcription may introduce errors due to background noise, multiple speakers, or speaker idiosyncrasies, making accuracy a potential concern. These accuracy implications should be fully considered before proceeding with a real-time speech-to-text approach for a given project.

Real-Time Speech-to-Text, Online Automatic Speech Recognition, and Streaming Speech-to-Text all refer to the same core technology.

For React Native applications, Picovoice provides Cheetah Streaming Speech-to-Text, a unique technology that performs all voice recognition directly on the device. This approach eliminates network-related delays and minimizes the latency between the user's speech input and the transcription output.

To use Cheetah in a React Native project, install the @picovoice/cheetah-react-native package:

npm install @picovoice/cheetah-react-native

Create a custom language model using the Picovoice Console or download the default model. In the Android subproject, add the model file to the assets folder. In the iOS subproject, add the model file to the Copy Bundle Resources step.

Initialize the Cheetah Streaming Speech-to-Text engine and start transcribing audio:

const cheetah = await Cheetah.create(
        "${ACCESS_KEY}", // AccessKey obtained from Picovoice Console
        "${MODEL_FILENAME}");

const partialResult = await cheetah.process(getAudioFrame());

For a more in-depth example, refer to the Cheetah React Native SDK quick start guide.

Batch Speech-to-Text

In contrast to real-time transcription, Batch Speech-to-Text requires the entire spoken phrase before providing a transcription. Batch processing typically results in higher transcription accuracy since it can analyze the entire spoken phrase, making it suitable for applications where precision is paramount, such as medical transcriptions or legal documentation. Batch processing also tends to be more computationally efficient as it doesn't require continuous real-time analysis, which can help conserve battery life on mobile devices. However, it comes with the drawback of introducing a slight delay in receiving transcriptions, making it less suitable for applications requiring immediate, real-time interaction. Therefore, developers should evaluate the specific needs of their application before opting for a Batch Speech-To-Text approach.

For React Native applications, Picovoice offers the Leopard Speech-to-Text SDK, representing cutting-edge technology designed for batch transcription tasks. Leopard processes all voice audio data on the device itself, ensuring robust privacy safeguards in line with regulations like HIPAA and GDPR. Furthermore, developers can augment the technology by integrating a customized vocabulary and reinforcing specific phrases with the Picovoice Console.

To use Leopard in a React Native project, install the @picovoice/leopard-react-native package:

npm install @picovoice/leopard-react-native

Initialize the Leopard Speech-to-Text engine and start transcribing audio:

const leopard = await Leopard.create(
        "${ACCESS_KEY}", // AccessKey obtained from Picovoice Console
        "${MODEL_FILENAME}");

const {transcript, words} = await leopard.processFile("${AUDIO_FILE_PATH}");

For a more in-depth example, refer to the Leopard React Native SDK quick start guide.

React Native Speech to Text

Real-Time Speech-to-Text

Batch Speech-to-Text

More from Picovoice