Speech-to-text

EV Sum 2 integrates Android’s Speech Recognition API to provide voice input capabilities with Spanish (Chile) locale and specialized normalization for email and password fields.

Overview

The speech-to-text feature enables hands-free input throughout the app, with intelligent normalization that converts spoken Spanish words into appropriate text formats.

The speech recognition is optimized for Chilean Spanish (es-CL) and includes custom normalization logic for email and password input.

Architecture

The feature consists of two main components:

SpeechController (services/SpeechController.kt:11) - Manages speech recognition lifecycle
Speech Normalization (domain/validators/SpeechNormalization.kt) - Converts spoken words to text

SpeechController implementation

The SpeechController class wraps Android’s SpeechRecognizer:

class SpeechController(
    context: Context,
    private val locale: Locale = Locale.forLanguageTag("es-CL")
) {
    private val recognizer: SpeechRecognizer = SpeechRecognizer.createSpeechRecognizer(context)

    private val intent: Intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH).apply {
        putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
        putExtra(RecognizerIntent.EXTRA_LANGUAGE, locale)
        putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true)
        putExtra(RecognizerIntent.EXTRA_PROMPT, "Habla ahora...")
    }

    fun start() {
        recognizer.startListening(intent)
    }

    fun stop() {
        recognizer.stopListening()
    }

    fun destroy() {
        recognizer.destroy()
    }
}

Setting up listeners

The controller provides a flexible listener system for handling speech events:

speechController.setListener(
    onReady = {
        // Recognition is ready to receive speech
        isListening = true
    },
    onPartial = { partialText ->
        // Receive partial results in real-time
        displayText = partialText
    },
    onFinal = { finalText ->
        // Receive final recognition result
        confirmedText = finalText
    },
    onError = { errorCode ->
        // Handle recognition errors
        handleError(errorCode)
    },
    onEnd = {
        // Recognition has ended
        isListening = false
    }
)

Usage example

Create controller

Initialize the speech controller with context:

val context = LocalContext.current
val speechController = remember { SpeechController(context) }

Set up listener

Configure event handlers for speech recognition:

LaunchedEffect(Unit) {
    speechController.setListener(
        onReady = { isListening = true },
        onPartial = { partial -> text = partial },
        onFinal = { final -> text = final },
        onError = { code -> error = getFriendlyErrorMessage(code) },
        onEnd = { isListening = false }
    )
}

Request permission

Request microphone permission before starting:

val micPermissionLauncher = rememberLauncherForActivityResult(
    contract = ActivityResultContracts.RequestPermission()
) { granted ->
    if (granted) {
        speechController.start()
    }
}

// Request permission
micPermissionLauncher.launch(Manifest.permission.RECORD_AUDIO)

Clean up

Destroy the controller when done:

DisposableEffect(Unit) {
    onDispose { speechController.destroy() }
}

Spanish normalization

The app includes specialized normalization functions that convert spoken Spanish words into appropriate text formats.

Email normalization

The normalizeEmailFromSpeech function converts Spanish words to email-valid characters:

fun normalizeEmailFromSpeech(input: String): String {
    var s = input.lowercase()

    val wordMap = mapOf(
        "arroba" to "@",
        "punto" to ".",
        "guion bajo" to "_",
        "guionbajo" to "_",
        "guion" to "-",
        "i latina" to "i",
        "ye" to "y",
        "y griega" to "y"
    )

    wordMap.forEach { (word, replacement) ->
        s = s.replace(word, replacement)
    }

    val numberMap = mapOf(
        "cero" to "0",
        "uno" to "1",
        "dos" to "2",
        // ... all digits
    )

    numberMap.forEach { (word, digit) ->
        s = s.replace(word, digit)
    }

    // Detect spelling mode and handle accordingly
    val tokens = s.split(" ").filter { it.isNotBlank() }
    val singleCharRatio = tokens.count { it.length == 1 } / tokens.size.toDouble()
    val isSpellingMode = singleCharRatio >= 0.6

    val normalized = tokens.joinToString("") { token ->
        when {
            token == "y" && isSpellingMode -> "i"
            else -> token
        }
    }

    return normalized
        .replace(" ", "")
        .replace(Regex("[^a-z0-9@._-]"), "")
        .trim()
}

Word mapping examples

Email Characters
Letters
Numbers

Spanish Word	Character
arroba	@
punto	.
guion bajo	_
guion	-

Spanish Word	Letter
i latina	i
ye	y
y griega	y

Spanish Word	Digit
cero	0
uno	1
dos	2
tres	3
cuatro	4
cinco	5

Example transformations

Input:  "juan punto lopez arroba gmail punto com"
Output: "[email protected]"

Password normalization

The normalizePasswordFromSpeech function is similar but handles passwords:

fun normalizePasswordFromSpeech(input: String): String {
    var s = input.lowercase()

    // Apply word mappings (same as email)
    wordMap.forEach { (word, replacement) ->
        s = s.replace(word, replacement)
    }

    // Apply number mappings
    numberMap.forEach { (word, digit) ->
        s = s.replace(word, digit)
    }

    val tokens = s.split(" ").filter { it.isNotBlank() }

    val normalized = tokens.joinToString("") { token ->
        when (token) {
            "y" -> "i"
            else -> token
        }
    }

    return normalized
        .replace(Regex("[^a-z0-9@._-]"), "")
        .trim()
}

Password normalization doesn’t use spelling mode detection since passwords are typically spoken as continuous words rather than individual characters.

Error handling

Implement user-friendly error messages for common speech recognition errors:

fun getFriendlyErrorMessage(code: Int): String {
    return when (code) {
        7 -> "Could not connect to voice service. Check your internet."
        9 -> "Microphone permission denied."
        2 -> "Network error. Try again."
        3 -> "We couldn't hear you well, try speaking louder."
        5 -> "Microphone is being used by another app."
        else -> "There was a problem with dictation. Try again."
    }
}

Common error codes

ERROR_NETWORK_TIMEOUT (7)

Network timeout - cannot connect to Google’s speech recognition service.Solution: Check internet connection and try again.

ERROR_INSUFFICIENT_PERMISSIONS (9)

Microphone permission not granted.Solution: Request RECORD_AUDIO permission.

ERROR_NETWORK (2)

Network error during recognition.Solution: Verify internet connectivity.

ERROR_SPEECH_TIMEOUT (3)

No speech input detected.Solution: Ask user to speak louder or closer to microphone.

ERROR_RECOGNIZER_BUSY (5)

Speech recognizer is busy.Solution: Wait and try again, or check if another app is using the microphone.

Permissions

Add the microphone permission to your AndroidManifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO" />

Request the permission at runtime:

val micPermissionLauncher = rememberLauncherForActivityResult(
    contract = ActivityResultContracts.RequestPermission()
) { granted ->
    if (granted) {
        speechController.start()
    } else {
        showError("Microphone permission required")
    }
}

micPermissionLauncher.launch(Manifest.permission.RECORD_AUDIO)

Integration example

Here’s how the login screen integrates speech recognition (ui/auth/LoginScreen.kt:84):

val speechController = remember { SpeechController(context) }
var dictationTarget by remember { mutableStateOf(DictationTarget.EMAIL) }
var isListening by remember { mutableStateOf(false) }

// Set up listener
LaunchedEffect(Unit) {
    speechController.setListener(
        onReady = {
            isListening = true
            speechError = null
        },
        onPartial = { partial ->
            when (dictationTarget) {
                DictationTarget.EMAIL -> email = normalizeEmailFromSpeech(partial)
                DictationTarget.PASSWORD -> password = normalizePasswordFromSpeech(partial)
            }
        },
        onFinal = { final ->
            isListening = false
            when (dictationTarget) {
                DictationTarget.EMAIL -> email = normalizeEmailFromSpeech(final)
                DictationTarget.PASSWORD -> password = normalizePasswordFromSpeech(final)
            }
        },
        onError = { code ->
            isListening = false
            speechError = getFriendlyErrorMessage(code)
        },
        onEnd = {
            isListening = false
        }
    )
}

// Voice input button
Button(
    onClick = {
        if (isListening) {
            speechController.stop()
        } else {
            showDictationDialog = true
        }
    }
) {
    Icon(if (isListening) Icons.Default.Mic else Icons.Default.MicNone)
    Text(if (isListening) "Listening..." else "Voice")
}

Best practices

Cleanup

Always call destroy() when the controller is no longer needed to release resources.

Partial results

Enable partial results for real-time feedback to improve user experience.

Error handling

Provide clear, actionable error messages for common issues.

Visual feedback

Show clear indicators when listening is active.

Speech recognition requires an internet connection to work, as it uses Google’s cloud-based recognition service.

Authentication

Voice input for login credentials

Text-to-Speech

Convert text back to speech

Get Started

Features

Architecture

Overview

Architecture

SpeechController implementation

Setting up listeners

Usage example

Spanish normalization

Email normalization

Word mapping examples

Example transformations

Password normalization

Error handling

Common error codes

Permissions

Integration example

Best practices

Cleanup

Partial results

Error handling

Visual feedback

Authentication

Text-to-Speech

Build docs developers (and LLMs) love

Get Started

Features

Architecture

​Overview

​Architecture

​SpeechController implementation

​Setting up listeners

​Usage example

​Spanish normalization

​Email normalization

​Word mapping examples

​Example transformations

​Password normalization

​Error handling

​Common error codes

​Permissions

​Integration example

​Best practices

Cleanup

Partial results

Error handling

Visual feedback

​Related features

Authentication

Text-to-Speech

Build docs developers (and LLMs) love

Overview

Architecture

SpeechController implementation

Setting up listeners

Usage example

Spanish normalization

Email normalization

Word mapping examples

Example transformations

Password normalization

Error handling

Common error codes

Permissions

Integration example

Best practices

Related features