Voice Actions

Overview

The Voice Actions module provides server-side functions for processing audio input, transcribing speech to text, and parsing voice commands using Google’s Gemini AI models.

transcribeAudio

Transcribes audio files using the Gemini Flash Lite model with specialized prompting to clean timestamps and filler words.

export async function transcribeAudio(
  audioDataUrl: string,
  mimeType: string = 'audio/webm'
): Promise<{ text: string; success: boolean; error?: string }>

Parameters

audioDataUrl

string

required

Base64-encoded audio string (data:audio/…)

mimeType

string

default:"audio/webm"

MIME type of the audio file

Response

text

string

Transcribed and cleaned text from the audio

success

boolean

Whether the transcription succeeded

error

string

Error message if transcription failed

Features

Size Validation: Enforces maximum audio size limits (configured in MAX_AUDIO_SIZE_MB)
Automatic Cleaning: Removes timestamps (00:00, 01:23, etc.) and excessive line breaks
Error Handling: Returns structured error responses with logging

Example

const result = await transcribeAudio("data:audio/webm;base64,GkXfo59ChoEBQveBAULygQRC...");

if (result.success) {
  console.log("Transcription:", result.text);
} else {
  console.error("Error:", result.error);
}

executeVoiceCommand

Parses a voice transcription into a structured command using AI-powered natural language understanding.

export async function executeVoiceCommand(
  transcript: string,
  options?: { minConfidence?: number; context?: string }
): Promise<
  | { success: true; command: VoiceCommand }
  | { success: false; error: string; code: string; recoverable: boolean }
>

Parameters

transcript

string

required

Transcribed text from voice input

options

object

Optional parsing configuration

options.minConfidence

number

Minimum confidence threshold (0-1) for accepting parsed commands

options.context

string

Additional context to help the AI understand the command

Response (Success)

success

boolean

Returns true on successful parsing

command

VoiceCommand

Parsed command object with action type and parameters

Response (Failure)

success

boolean

Returns false on parsing failure

error

string

Human-readable error message

code

string

Error code: MISSING_API_KEY, PARSING_FAILED, or EXECUTION_ERROR

recoverable

boolean

Whether the error is recoverable (e.g., user can retry)

Features

API Key Validation: Checks for GOOGLE_GENERATIVE_AI_API_KEY before processing
Structured Validation: Uses Zod schemas for command validation
Language Support: Configured for Spanish (es-ES) commands
Confidence Scoring: Filters low-confidence interpretations

Example

const result = await executeVoiceCommand(
  "Crear orden urgente para la UMA",
  { minConfidence: 0.8, context: "work orders" }
);

if (result.success) {
  console.log("Action:", result.command.action);
  console.log("Parameters:", result.command.parameters);
} else {
  console.error(`Error [${result.code}]:`, result.error);
  console.log("Recoverable:", result.recoverable);
}

Error Codes

Code	Description	Recoverable
`MISSING_API_KEY`	Google AI API key not configured	No
`PARSING_FAILED`	Could not parse command from transcript	Yes
`EXECUTION_ERROR`	Unexpected error during processing	No

Configuration

Environment Variables

GOOGLE_GENERATIVE_AI_API_KEY: Required for all voice operations
MAX_AUDIO_SIZE_MB: Maximum audio file size (defined in config/limits)

Dependencies

@ai-sdk/google: Google AI SDK for Gemini models
ai: Vercel AI SDK for text generation
VoiceCommandParserService: Internal service for command parsing

Models Used

Transcription: gemini-2.5-flash-lite (optimized for speed)
Command Parsing: Configured via VoiceCommandParserService

API Routes

Server Actions

Services

AI Tools

Overview

transcribeAudio

Parameters

Response

Features

Example

executeVoiceCommand

Parameters

Response (Success)

Response (Failure)

Features

Example

Error Codes

Configuration

Environment Variables

Dependencies

Models Used

Build docs developers (and LLMs) love

API Routes

Server Actions

Services

AI Tools

​Overview

​transcribeAudio

​Parameters

​Response

​Features

​Example

​executeVoiceCommand

​Parameters

​Response (Success)

​Response (Failure)

​Features

​Example

​Error Codes

​Configuration

​Environment Variables

​Dependencies

​Models Used

Build docs developers (and LLMs) love

Overview

transcribeAudio

Parameters

Response

Features

Example

executeVoiceCommand

Parameters

Response (Success)

Response (Failure)

Features

Example

Error Codes

Configuration

Environment Variables

Dependencies

Models Used