ML Kit Android Speech-to-Speech Translation: Complete Kotlin Tutorial

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI applications running entirely on mobile without sharing user data with 3rd parties.

On-device speech-to-speech translation on Android combines Speech-to-Text (STT), ML Kit Translation, and Text-to-Speech (TTS) to convert spoken words in one language into spoken words in another language—all without cloud connectivity. This approach eliminates the three major challenges of traditional cloud-based voice translation: network latency that disrupts conversation flow, privacy concerns from sending audio to external servers, and complete app failure without internet connection.

For Android developers, implementing ML Kit speech-to-speech translation (voice-to-voice translation) unlocks critical use cases: travel apps for tourists, multilingual video conferencing, and more. This ML Kit Android tutorial shows you how to build fully on-device speech-to-speech translation that runs speech processing and translation offline, protects user privacy, and delivers instant results.

ML Kit Speech-to-Speech Translation: Three Core Components

Speech-to-Text (STT): Converts spoken phrases into text
ML Kit Translation: Converts text from source language to target language
Text-to-Speech (TTS): Converts translated text into synthesized speech

ML Kit Android Voice Translation Stack

This tutorial combines ML Kit with on-device speech processing SDKs:

Google ML Kit Translation – On-device text translation (50+ languages)
Cheetah Streaming Speech-to-Text – Real-time Android speech transcription
Orca Text-to-Speech – Natural-sounding speech synthesis for Android
VoiceProcessor – Audio capture for Android

What You'll Build in This ML Kit Tutorial

This step-by-step guide teaches you to:

Capture real-time audio using VoiceProcessor
Transcribe speech to text with Cheetah Streaming Speech-to-Text
Translate text using ML Kit's on-device translation API
Synthesize speech from translated text with Orca Text-to-Speech
Play back synthesized audio with AudioTrack

This tutorial is designed for Android developers with basic Kotlin/Java experience who want to add ML Kit speech-to-speech translation to their apps.

How to Build ML Kit Android Speech-to-Speech Translation with Kotlin

Here is a preview of what you'll have built by the end:

Prerequisites for ML Kit Android Development

Before you begin this ML Kit tutorial, ensure you have:

Android Studio
Android device or emulator, Android 7.0 API 24 or higher
USB debugging enabled on your Android device

The minimum version supported by ML Kit is Android API level 23, and level 21 for Cheetah Streaming STT and Orca TTS.

For this ML Kit Android tutorial, we'll use level 24 as the minimum supported version as it's the default recommended setting for new Kotlin projects and provides access to newer APIs.

Step 1: ML Kit Android Project Setup

This tutorial demonstrates a project built with Kotlin with Jetpack Compose, targeting Android 15 (API level 35) with a minimum supported version of Android 7.0 (API level 24).

Configure Android Permissions for ML Kit Speech-to-Speech

Add these permissions to your AndroidManifest.xml:

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />

Cheetah Streaming Speech-to-Text and Orca Text-to-Speech require internet connectivity only during initialization to authenticate your AccessKey. All transcription and speech synthesis run entirely on-device.
Google ML Kit Translation downloads language models on first use and caches them locally for offline use.

Step 2: Add ML Kit Android Dependencies

2a. Install ML Kit and Speech Processing Libraries

In app/build.gradle.kts, add the required dependencies:

dependencies {
    implementation(libs.android.voice.processor)
    implementation(libs.cheetah.android)
    implementation(libs.orca.android)
    implementation(libs.mlkit.translate)
}

Then in gradle/libs.versions.toml (replace {LATEST_VERSION} with current versions):

[versions]
androidVoiceProcessor = "{LATEST_VERSION}"  # e.g. 1.0.2
cheetahAndroid = "{LATEST_VERSION}"         # e.g. 3.0.0
orcaAndroid = "{LATEST_VERSION}"            # e.g. 2.0.0
mlkitTranslate = "{LATEST_VERSION}"         # e.g. 17.0.2

[libraries]
android-voice-processor = { module = "ai.picovoice:android-voice-processor", version.ref = "androidVoiceProcessor" }
cheetah-android = { module = "ai.picovoice:cheetah-android", version.ref = "cheetahAndroid" }
orca-android = { module = "ai.picovoice:orca-android", version.ref = "orcaAndroid" }
mlkit-translate = { module = "com.google.mlkit:translate", version.ref = "mlkitTranslate" }

Execute a Gradle sync.

At the time of writing, the latest versions are:

androidVoiceProcessor: 1.0.2
cheetahAndroid: 3.0.0
orcaAndroid: 2.0.0
mlkitTranslate: 17.0.3

2b. Add Language Model Files for Speech Processing

Both Cheetah and Orca use model files (.pv) for different languages.

Download Cheetah models from the Cheetah GitHub repository
Download Orca models from the Orca GitHub repository
Place the models in your Android project under: {ANDROID_APP}/src/main/assets

For this ML Kit example, we'll use:

cheetah_params_fast.pv for English speech recognition
orca_params_es_female.pv for Spanish speech synthesis

Step 3: Implement Android Speech-to-Text with Cheetah

3a. Initialize Cheetah for Real-Time Transcription

Use Cheetah.Builder to create a Cheetah instance:

cheetah = Cheetah.Builder()
    .setAccessKey("{ACCESS_KEY}") // AccessKey from Picovoice Console
    .setModelPath("cheetah_params_fast.pv") // English model
    .setEndpointDuration(1f)
    .setEnableAutomaticPunctuation(true)
    .build(appContext)

setEndpointDuration: Controls how long Cheetah waits to detect the end of speech. Once it detects a pause longer than this duration, it considers the utterance finished.
setEnableAutomaticPunctuation: Enables automatic punctuation in transcripts.

3b. Set Up Audio Capture with VoiceProcessor

VoiceProcessor is a library for real-time audio capture on Android. It provides audio frames to listeners at a specified sample rate and frame length.

Access the singleton instance:

import ai.picovoice.android.voiceprocessor.*

val voiceProcessor = VoiceProcessor.getInstance()

3c. Create Frame Listener for Speech Processing

Create a listener to process audio frames and generate transcripts:

val currentTranscript = StringBuilder()

val frameListener = VoiceProcessorFrameListener { frame ->
    val result = cheetah.process(frame)
    result?.let {
        // Handle partial transcript
        if (it.transcript.isNotBlank()) {
            currentTranscript.append(it.transcript)
        }

        // Handle end of speech
        if (it.isEndpoint) {
            val flushedTranscript = cheetah.flush().transcript
            currentTranscript.append(flushedTranscript)
        }
    }
}

Key points:

process(): Processes each audio frame and returns partial transcripts
isEndpoint: Indicates when Cheetah has detected the end of an utterance
flush(): Processes any remaining buffered audio

3d. Start Capturing Audio

Add the frame listener and start capturing audio:

voiceProcessor.addFrameListener(frameListener)
voiceProcessor.start(cheetah.frameLength, cheetah.sampleRate)

Cheetah specifies the required frameLength and sampleRate after initialization. Use these values when starting VoiceProcessor.

3e. Stop Audio Capture

When done listening:

voiceProcessor.stop()
voiceProcessor.clearFrameListeners()

Step 4: Implement ML Kit Translation in Android

4a. Configure ML Kit Translation API

Set up the ML Kit translator with source and target language codes:

import com.google.mlkit.nl.translate.*

val options = TranslatorOptions.Builder()
    .setSourceLanguage(TranslateLanguage.ENGLISH)
    .setTargetLanguage(TranslateLanguage.SPANISH)
    .build()

val translator = Translation.getClient(options)

4b. Download ML Kit Language Models

Before translation, ensure the required language models are downloaded:

var conditions = DownloadConditions.Builder()
    .requireWifi()
    .build()
translator.downloadModelIfNeeded(conditions)
    .addOnSuccessListener {
        // Model is ready, can begin translation
        isTranslatorReady = true
    }
    .addOnFailureListener { exception ->
        // Handle download failure
        errorMsg = "Failed to download translation model: ${exception.message}"
    }

ML Kit caches downloaded models locally. After the first download, ML Kit translation works offline for those language pairs.

4c. Translate Text with ML Kit

Translate the recognized speech using ML Kit:

translator.translate(text)
    .addOnSuccessListener { translatedText ->
        // Handle translated text
    }
    .addOnFailureListener { exception ->
        // Handle translation failure
    }

Step 5: Implement Text-to-Speech with Orca

5a. Initialize Orca TTS Engine

Use Orca.Builder to create an Orca instance:

orca = Orca.Builder()
    .setAccessKey("{ACCESS_KEY}") // AccessKey from Picovoice Console
    .setModelPath("orca_params_es_female.pv") // Spanish female voice
    .build(appContext)

5b. Synthesize Translated Text to Speech

Synthesize the translated text:

val params = OrcaSynthesizeParams.Builder().build()
val pcm = orca.synthesize(text, params).pcm

orca.synthesize() processes complete text and returns all PCM audio data in one call.

Step 6: Play Synthesized Speech with AudioTrack

6a. Configure AudioTrack for Speech Playback

Orca outputs mono, 16-bit PCM, with a sample rate matching orca.sampleRate:

val sampleRate = orca.sampleRate
val minBuffer = AudioTrack.getMinBufferSize(
    sampleRate,
    AudioFormat.CHANNEL_OUT_MONO,
    AudioFormat.ENCODING_PCM_16BIT
)

val bufferSize = maxOf(minBuffer, pcm.size * 2)
val audioTrack = AudioTrack.Builder()
    .setAudioFormat(
        AudioFormat.Builder()
            .setEncoding(AudioFormat.ENCODING_PCM_16BIT)
            .setSampleRate(sampleRate)
            .setChannelMask(AudioFormat.CHANNEL_OUT_MONO)
            .build()
    )
    .setBufferSizeInBytes(bufferSize)
    .setTransferMode(AudioTrack.MODE_STATIC)
    .build()

6b. Play Audio Output

Write the entire PCM buffer and play:

audioTrack.write(pcm, 0, pcm.size)
audioTrack.play()

// Wait for playback to complete
Thread.sleep((pcm.size * 1000L) / sampleRate)

audioTrack.stop()
audioTrack.release()

Step 7: Clean Up Resources

When done, always clean up resources:

// Stop audio recording
voiceProcessor.stop()
voiceProcessor.clearFrameListeners()

// Clean up Cheetah
cheetah.delete()
cheetah = null

// Clean up Orca
orca.delete()
orca = null

// Clean up translator
translator.close()
translator = null

Complete ML Kit Android Speech-to-Speech Example Code

Below is a complete example application demonstrating the full ML Kit speech-to-speech translation pipeline. Before building and running, update the package name and replace {ACCESS_KEY} with your Picovoice AccessKey:

package com.example.voicetranslationmlkit // Change to match your project name

import ai.picovoice.android.voiceprocessor.VoiceProcessor
import ai.picovoice.android.voiceprocessor.VoiceProcessorFrameListener
import ai.picovoice.cheetah.Cheetah
import ai.picovoice.orca.Orca
import ai.picovoice.orca.OrcaSynthesizeParams
import android.Manifest
import android.content.Context
import android.content.pm.PackageManager
import android.media.AudioFormat
import android.media.AudioTrack
import android.os.Bundle
import androidx.activity.ComponentActivity
import androidx.activity.compose.setContent
import androidx.compose.foundation.layout.Arrangement
import androidx.compose.foundation.layout.Column
import androidx.compose.foundation.layout.Row
import androidx.compose.foundation.layout.fillMaxSize
import androidx.compose.foundation.layout.fillMaxWidth
import androidx.compose.foundation.layout.padding
import androidx.compose.material3.Button
import androidx.compose.material3.Card
import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.Text
import androidx.compose.runtime.Composable
import androidx.compose.runtime.getValue
import androidx.compose.runtime.mutableStateOf
import androidx.compose.runtime.remember
import androidx.compose.runtime.setValue
import androidx.compose.ui.Modifier
import androidx.compose.ui.unit.dp
import androidx.core.app.ActivityCompat
import com.example.voicetranslationmlkit.ui.theme.VoiceTranslationMLKitTheme
import com.google.mlkit.common.model.DownloadConditions
import com.google.mlkit.nl.translate.TranslateLanguage
import com.google.mlkit.nl.translate.Translation
import com.google.mlkit.nl.translate.Translator
import com.google.mlkit.nl.translate.TranslatorOptions
import java.util.concurrent.Executors


enum class AppState {
    Initial,        // Ready to initialize
    Loading,        // Initialization in progress
    Ready,          // Ready to start translation
    Listening,      // Recording and transcribing speech
    Translating,    // Translation in progress
    Speaking,       // Playing translated speech
}

const val PV_ACCESS_KEY = "{ACCESS_KEY}"

const val CHEETAH_MODEL_PATH = "cheetah_params_fast.pv"

const val ORCA_MODEL_PATH = "orca_params_es_female.pv"

const val RECORD_AUDIO_PERMISSION_REQUEST_CODE = 1

class MainActivity : ComponentActivity() {

    private var cheetah: Cheetah? = null
    private var orca: Orca? = null
    private var translator: Translator? = null
    private val voiceProcessor = VoiceProcessor.getInstance()

    private val currentTranscript = StringBuilder()

    private val executorTranslate = Executors.newSingleThreadExecutor()
    private val executorSpeak = Executors.newSingleThreadExecutor()

    private var pendingListeningAction: (() -> Unit)? = null

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContent {
            VoiceTranslationMLKitTheme {
                AppUI(appContext = applicationContext)
            }
        }
    }

    private fun requestRecordPermission() {
        ActivityCompat.requestPermissions(
            this,
            arrayOf(Manifest.permission.RECORD_AUDIO),
            RECORD_AUDIO_PERMISSION_REQUEST_CODE
        )
    }

    override fun onRequestPermissionsResult(
        requestCode: Int,
        permissions: Array<String>,
        grantResults: IntArray
    ) {
        super.onRequestPermissionsResult(requestCode, permissions, grantResults)
        if (requestCode == RECORD_AUDIO_PERMISSION_REQUEST_CODE) {
            if (grantResults.isNotEmpty() && grantResults[0] == PackageManager.PERMISSION_GRANTED) {
                pendingListeningAction?.invoke()
                pendingListeningAction = null
            } else {
                pendingListeningAction = null
            }
        }
    }

    @Composable
    fun AppUI(appContext: Context) {
        var state by remember { mutableStateOf(AppState.Initial) }
        var errorMsg by remember { mutableStateOf<String?>(null) }
        var transcribedText by remember { mutableStateOf("") }
        var translatedText by remember { mutableStateOf("") }

        val isBusy = state in listOf(
            AppState.Loading,
            AppState.Translating,
            AppState.Speaking
        )

        Column(
            modifier = Modifier
                .fillMaxSize()
                .padding(16.dp),
            verticalArrangement = Arrangement.spacedBy(16.dp)
        ) {
            Text(
                text = "Voice-to-Voice Translation",
                style = MaterialTheme.typography.headlineMedium
            )

            Text(
                text = "English → Spanish",
                style = MaterialTheme.typography.bodyLarge
            )

            errorMsg?.let {
                Text(it, color = MaterialTheme.colorScheme.error)
            }

            if (state == AppState.Initial) {
                Button(
                    onClick = {
                        state = AppState.Loading
                        errorMsg = null
                        initializeComponents(appContext) { success, error ->
                            if (success) {
                                state = AppState.Ready
                            } else {
                                errorMsg = error
                                state = AppState.Initial
                            }
                        }
                    },
                    enabled = !isBusy
                ) {
                    Text("Initialize Translation System")
                }
            }

            if (state in listOf(AppState.Ready, AppState.Listening)) {
                Card(
                    modifier = Modifier.fillMaxWidth()
                ) {
                    Column(modifier = Modifier.padding(16.dp)) {
                        Text(
                            text = "Transcribed (English):",
                            style = MaterialTheme.typography.labelMedium
                        )
                        Text(
                            text = transcribedText.ifBlank { "Press 'Start Speaking' to begin..." },
                            style = MaterialTheme.typography.bodyMedium,
                            modifier = Modifier.padding(top = 8.dp)
                        )
                    }
                }

                Row(horizontalArrangement = Arrangement.spacedBy(8.dp)) {
                    if (state == AppState.Ready) {
                        Button(
                            onClick = {
                                val startListeningAction = {
                                    state = AppState.Listening
                                    transcribedText = ""
                                    translatedText = ""
                                    startListening(
                                        onPartial = { text ->
                                            transcribedText = text
                                        },
                                        onFinal = { text ->
                                            transcribedText = text
                                            state = AppState.Translating
                                            translateAndSpeak(text) { translated ->
                                                translatedText = translated
                                                state = AppState.Ready
                                            }
                                        }
                                    )
                                }

                                // Check for recording permission
                                if (voiceProcessor.hasRecordAudioPermission(this@MainActivity)) {
                                    startListeningAction()
                                } else {
                                    pendingListeningAction = startListeningAction
                                    requestRecordPermission()
                                }
                            },
                            enabled = !isBusy
                        ) {
                            Text("Start Speaking")
                        }
                    }
                }
            }

            if (translatedText.isNotBlank()) {
                Card(
                    modifier = Modifier.fillMaxWidth()
                ) {
                    Column(modifier = Modifier.padding(16.dp)) {
                        Text(
                            text = "Translated (Spanish):",
                            style = MaterialTheme.typography.labelMedium
                        )
                        Text(
                            text = translatedText,
                            style = MaterialTheme.typography.bodyMedium,
                            modifier = Modifier.padding(top = 8.dp)
                        )
                    }
                }
            }

            if (state == AppState.Ready) {
                Button(
                    onClick = {
                        cleanup()
                        transcribedText = ""
                        translatedText = ""
                        state = AppState.Initial
                    },
                    enabled = !isBusy
                ) {
                    Text("Reset / Cleanup")
                }
            }

            when (state) {
                AppState.Loading -> Text("Initializing...")
                AppState.Listening -> Text("Listening... (speak now)")
                AppState.Translating -> Text("Translating...")
                AppState.Speaking -> Text("Speaking translation...")
                else -> {}
            }
        }
    }

    private fun initializeComponents(
        appContext: Context,
        onComplete: (Boolean, String?) -> Unit
    ) {
        executorTranslate.submit {
            try {
                // Initialize Cheetah
                cheetah = Cheetah.Builder()
                    .setAccessKey(PV_ACCESS_KEY)
                    .setModelPath(CHEETAH_MODEL_PATH)
                    .setEndpointDuration(1f)
                    .setEnableAutomaticPunctuation(true)
                    .build(appContext)

                // Initialize Orca
                orca = Orca.Builder()
                    .setAccessKey(PV_ACCESS_KEY)
                    .setModelPath(ORCA_MODEL_PATH)
                    .build(appContext)

                // Initialize Google ML Kit Translator
                val options = TranslatorOptions.Builder()
                    .setSourceLanguage(TranslateLanguage.ENGLISH)
                    .setTargetLanguage(TranslateLanguage.SPANISH)
                    .build()
                translator = Translation.getClient(options)

                // Download translation models
                var conditions = DownloadConditions.Builder()
                    .requireWifi()
                    .build()
                translator?.downloadModelIfNeeded(conditions)
                    ?.addOnSuccessListener {
                        onComplete(true, null)
                    }
                    ?.addOnFailureListener { exception ->
                        onComplete(false, "Failed to download translation model: ${exception.message}")
                    }
            } catch (e: Exception) {
                onComplete(false, "Initialization failed: ${e.message}")
            }
        }
    }

    private fun startListening(
        onPartial: (String) -> Unit,
        onFinal: (String) -> Unit
    ) {
        currentTranscript.clear()

        val frameListener = VoiceProcessorFrameListener { frame ->
            try {
                val result = cheetah?.process(frame)
                result?.let {
                    // Handle partial transcript
                    if (it.transcript.isNotBlank()) {
                        currentTranscript.append(it.transcript)
                        onPartial(currentTranscript.toString())
                    }

                    // Handle end of speech
                    if (it.isEndpoint) {
                        val finalTranscript = cheetah?.flush()?.transcript ?: ""
                        val fullText = currentTranscript.toString() + finalTranscript

                        // Stop listening when endpoint detected
                        voiceProcessor.stop()
                        voiceProcessor.clearFrameListeners()

                        onFinal(fullText)
                    }
                }
            } catch (e: Exception) {
                e.printStackTrace()
            }
        }

        voiceProcessor.addFrameListener(frameListener)
        cheetah?.let {
            voiceProcessor.start(it.frameLength, it.sampleRate)
        }
    }

    private fun translateAndSpeak(text: String, onComplete: (String) -> Unit) {
        if (text.isBlank()) {
            onComplete("")
            return
        }

        executorTranslate.submit {
            translator?.translate(text)
                ?.addOnSuccessListener { translatedText ->
                    onComplete(translatedText)
                    synthesizeAndPlay(translatedText)
                }
                ?.addOnFailureListener { exception ->
                    onComplete("")
                }
        }
    }

    private fun synthesizeAndPlay(text: String) {
        executorSpeak.submit {
            try {
                val params = OrcaSynthesizeParams.Builder().build()
                val pcm = orca?.synthesize(text, params)?.pcm

                if (pcm != null && pcm.isNotEmpty()) {
                    val sampleRate = orca?.sampleRate ?: 22050
                    val minBuffer = AudioTrack.getMinBufferSize(
                        sampleRate,
                        AudioFormat.CHANNEL_OUT_MONO,
                        AudioFormat.ENCODING_PCM_16BIT
                    )

                    val bufferSize = maxOf(minBuffer, pcm.size * 2)

                    val audioTrack = AudioTrack.Builder()
                        .setAudioFormat(
                            AudioFormat.Builder()
                                .setEncoding(AudioFormat.ENCODING_PCM_16BIT)
                                .setSampleRate(sampleRate)
                                .setChannelMask(AudioFormat.CHANNEL_OUT_MONO)
                                .build()
                        )
                        .setBufferSizeInBytes(bufferSize)
                        .setTransferMode(AudioTrack.MODE_STATIC)
                        .build()

                    audioTrack.write(pcm, 0, pcm.size)
                    audioTrack.play()

                    // Wait for playback to complete
                    Thread.sleep((pcm.size * 1000L) / sampleRate)

                    audioTrack.stop()
                    audioTrack.release()
                }
            } catch (e: Exception) {
                e.printStackTrace()
            }
        }
    }

    private fun cleanup() {
        try {
            voiceProcessor.stop()
            voiceProcessor.clearFrameListeners()
        } catch (_: Exception) {}

        try {
            cheetah?.delete()
            cheetah = null
        } catch (_: Exception) {}

        try {
            orca?.delete()
            orca = null
        } catch (_: Exception) {}

        try {
            translator?.close()
            translator = null
        } catch (_: Exception) {}

        currentTranscript.clear()
    }

    override fun onDestroy() {
        super.onDestroy()
        cleanup()
    }
}

For complete Android demo applications, see:

This ML Kit tutorial uses the following packages:

Explore our documentation for more details:

ML Kit Android Troubleshooting

Initialization fails: Ensure model files exist in assets and AccessKey is valid.
No audio input: Verify microphone permissions are granted at runtime and that the device has a working microphone.
Transcription fails or returns unexpected results: Ensure you're using the correct frameLength and sampleRate from Cheetah after initialization.
ML Kit translation fails: Ensure internet connectivity for first-time model download. After download, ML Kit translation works offline.
No audio output or distorted sound: Verify device volume and that AudioTrack configuration matches Orca's output format.

Next Steps: Production-Ready ML Kit Speech-to-Speech Translation

Optimize ML Kit Voice Translation for Production

Dynamic language selection: Allow users to select source and target languages dynamically. Load appropriate Cheetah/Orca models and configure ML Kit accordingly.
Audio focus management: Request audio focus when speaking translations and release it when done.
Error recovery: Implement retry logic for ML Kit translation failures and gracefully handle network issues.
Endpoint tuning: Adjust Cheetah's endpoint detection sensitivity based on your use case using setEndpointDuration().

Advanced ML Kit Integration Ideas

Enhance your ML Kit speech-to-speech translation app by adding:

Bidirectional translation: Support translation in both directions with language switching
Multiple language support: Integrate language detection to automatically identify the source language
Custom vocabulary: Use Cheetah's custom model training for domain-specific terminology

Start Free

ML Kit Android Speech-to-Speech FAQ

How do I add support for more languages in ML Kit Android speech-to-speech translation?

Download the appropriate Cheetah model for your source language and Orca model for your target language from their respective GitHub repositories. Update the ML Kit TranslatorOptions with the corresponding language codes. You can dynamically switch between language pairs by reinitializing the components with different models.

Can I use this with other translation services besides Google ML Kit?

Yes. The architecture separates STT, translation, and TTS into independent components. You can replace ML Kit with any translation API by modifying the translateAndSpeak() function to call your preferred service.

Does ML Kit Android speech-to-speech translation work offline?

Yes, after the initialization step. ML Kit requires internet connectivity to download language models on first use, but caches them locally. Once downloaded, ML Kit translation works completely offline. Similarly, Cheetah and Orca only need internet during initialization for authentication—all processing happens on-device afterward.

What Android API levels support ML Kit speech-to-speech translation?

At the time of writing, ML Kit requires Android API level 23 (Android 6.0) as the minimum, while Cheetah and Orca support API level 21. This tutorial uses API level 24 (Android 7.0) as the recommended baseline for new projects, providing access to newer Android features while maintaining broad device compatibility.