Skip to main content
EV Sum 2 integrates Android’s Speech Recognition API to provide voice input capabilities with Spanish (Chile) locale and specialized normalization for email and password fields.

Overview

The speech-to-text feature enables hands-free input throughout the app, with intelligent normalization that converts spoken Spanish words into appropriate text formats.
The speech recognition is optimized for Chilean Spanish (es-CL) and includes custom normalization logic for email and password input.

Architecture

The feature consists of two main components:
  • SpeechController (services/SpeechController.kt:11) - Manages speech recognition lifecycle
  • Speech Normalization (domain/validators/SpeechNormalization.kt) - Converts spoken words to text

SpeechController implementation

The SpeechController class wraps Android’s SpeechRecognizer:
class SpeechController(
    context: Context,
    private val locale: Locale = Locale.forLanguageTag("es-CL")
) {
    private val recognizer: SpeechRecognizer = SpeechRecognizer.createSpeechRecognizer(context)

    private val intent: Intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH).apply {
        putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
        putExtra(RecognizerIntent.EXTRA_LANGUAGE, locale)
        putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true)
        putExtra(RecognizerIntent.EXTRA_PROMPT, "Habla ahora...")
    }

    fun start() {
        recognizer.startListening(intent)
    }

    fun stop() {
        recognizer.stopListening()
    }

    fun destroy() {
        recognizer.destroy()
    }
}

Setting up listeners

The controller provides a flexible listener system for handling speech events:
speechController.setListener(
    onReady = {
        // Recognition is ready to receive speech
        isListening = true
    },
    onPartial = { partialText ->
        // Receive partial results in real-time
        displayText = partialText
    },
    onFinal = { finalText ->
        // Receive final recognition result
        confirmedText = finalText
    },
    onError = { errorCode ->
        // Handle recognition errors
        handleError(errorCode)
    },
    onEnd = {
        // Recognition has ended
        isListening = false
    }
)

Usage example

1

Create controller

Initialize the speech controller with context:
val context = LocalContext.current
val speechController = remember { SpeechController(context) }
2

Set up listener

Configure event handlers for speech recognition:
LaunchedEffect(Unit) {
    speechController.setListener(
        onReady = { isListening = true },
        onPartial = { partial -> text = partial },
        onFinal = { final -> text = final },
        onError = { code -> error = getFriendlyErrorMessage(code) },
        onEnd = { isListening = false }
    )
}
3

Request permission

Request microphone permission before starting:
val micPermissionLauncher = rememberLauncherForActivityResult(
    contract = ActivityResultContracts.RequestPermission()
) { granted ->
    if (granted) {
        speechController.start()
    }
}

// Request permission
micPermissionLauncher.launch(Manifest.permission.RECORD_AUDIO)
4

Clean up

Destroy the controller when done:
DisposableEffect(Unit) {
    onDispose { speechController.destroy() }
}

Spanish normalization

The app includes specialized normalization functions that convert spoken Spanish words into appropriate text formats.

Email normalization

The normalizeEmailFromSpeech function converts Spanish words to email-valid characters:
fun normalizeEmailFromSpeech(input: String): String {
    var s = input.lowercase()

    val wordMap = mapOf(
        "arroba" to "@",
        "punto" to ".",
        "guion bajo" to "_",
        "guionbajo" to "_",
        "guion" to "-",
        "i latina" to "i",
        "ye" to "y",
        "y griega" to "y"
    )

    wordMap.forEach { (word, replacement) ->
        s = s.replace(word, replacement)
    }

    val numberMap = mapOf(
        "cero" to "0",
        "uno" to "1",
        "dos" to "2",
        // ... all digits
    )

    numberMap.forEach { (word, digit) ->
        s = s.replace(word, digit)
    }

    // Detect spelling mode and handle accordingly
    val tokens = s.split(" ").filter { it.isNotBlank() }
    val singleCharRatio = tokens.count { it.length == 1 } / tokens.size.toDouble()
    val isSpellingMode = singleCharRatio >= 0.6

    val normalized = tokens.joinToString("") { token ->
        when {
            token == "y" && isSpellingMode -> "i"
            else -> token
        }
    }

    return normalized
        .replace(" ", "")
        .replace(Regex("[^a-z0-9@._-]"), "")
        .trim()
}

Word mapping examples

Spanish WordCharacter
arroba@
punto.
guion bajo_
guion-

Example transformations

Input:  "juan punto lopez arroba gmail punto com"
Output: "[email protected]"

Password normalization

The normalizePasswordFromSpeech function is similar but handles passwords:
fun normalizePasswordFromSpeech(input: String): String {
    var s = input.lowercase()

    // Apply word mappings (same as email)
    wordMap.forEach { (word, replacement) ->
        s = s.replace(word, replacement)
    }

    // Apply number mappings
    numberMap.forEach { (word, digit) ->
        s = s.replace(word, digit)
    }

    val tokens = s.split(" ").filter { it.isNotBlank() }

    val normalized = tokens.joinToString("") { token ->
        when (token) {
            "y" -> "i"
            else -> token
        }
    }

    return normalized
        .replace(Regex("[^a-z0-9@._-]"), "")
        .trim()
}
Password normalization doesn’t use spelling mode detection since passwords are typically spoken as continuous words rather than individual characters.

Error handling

Implement user-friendly error messages for common speech recognition errors:
fun getFriendlyErrorMessage(code: Int): String {
    return when (code) {
        7 -> "Could not connect to voice service. Check your internet."
        9 -> "Microphone permission denied."
        2 -> "Network error. Try again."
        3 -> "We couldn't hear you well, try speaking louder."
        5 -> "Microphone is being used by another app."
        else -> "There was a problem with dictation. Try again."
    }
}

Common error codes

Network timeout - cannot connect to Google’s speech recognition service.Solution: Check internet connection and try again.
Microphone permission not granted.Solution: Request RECORD_AUDIO permission.
Network error during recognition.Solution: Verify internet connectivity.
No speech input detected.Solution: Ask user to speak louder or closer to microphone.
Speech recognizer is busy.Solution: Wait and try again, or check if another app is using the microphone.

Permissions

Add the microphone permission to your AndroidManifest.xml:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
Request the permission at runtime:
val micPermissionLauncher = rememberLauncherForActivityResult(
    contract = ActivityResultContracts.RequestPermission()
) { granted ->
    if (granted) {
        speechController.start()
    } else {
        showError("Microphone permission required")
    }
}

micPermissionLauncher.launch(Manifest.permission.RECORD_AUDIO)

Integration example

Here’s how the login screen integrates speech recognition (ui/auth/LoginScreen.kt:84):
val speechController = remember { SpeechController(context) }
var dictationTarget by remember { mutableStateOf(DictationTarget.EMAIL) }
var isListening by remember { mutableStateOf(false) }

// Set up listener
LaunchedEffect(Unit) {
    speechController.setListener(
        onReady = {
            isListening = true
            speechError = null
        },
        onPartial = { partial ->
            when (dictationTarget) {
                DictationTarget.EMAIL -> email = normalizeEmailFromSpeech(partial)
                DictationTarget.PASSWORD -> password = normalizePasswordFromSpeech(partial)
            }
        },
        onFinal = { final ->
            isListening = false
            when (dictationTarget) {
                DictationTarget.EMAIL -> email = normalizeEmailFromSpeech(final)
                DictationTarget.PASSWORD -> password = normalizePasswordFromSpeech(final)
            }
        },
        onError = { code ->
            isListening = false
            speechError = getFriendlyErrorMessage(code)
        },
        onEnd = {
            isListening = false
        }
    )
}

// Voice input button
Button(
    onClick = {
        if (isListening) {
            speechController.stop()
        } else {
            showDictationDialog = true
        }
    }
) {
    Icon(if (isListening) Icons.Default.Mic else Icons.Default.MicNone)
    Text(if (isListening) "Listening..." else "Voice")
}

Best practices

Cleanup

Always call destroy() when the controller is no longer needed to release resources.

Partial results

Enable partial results for real-time feedback to improve user experience.

Error handling

Provide clear, actionable error messages for common issues.

Visual feedback

Show clear indicators when listening is active.
Speech recognition requires an internet connection to work, as it uses Google’s cloud-based recognition service.

Authentication

Voice input for login credentials

Text-to-Speech

Convert text back to speech

Build docs developers (and LLMs) love