Skip to main content

Overview

The voice pipeline provides three modes:
  1. Live mode: Continuous mic → STT → LLM → TTS → speaker loop
  2. Push-to-talk: Capture on demand, transcribe with offline Whisper
  3. Text mode: Direct LLM processing without STT

Live Voice Pipeline (macOS)

rcli_start_listening

Start the live voice pipeline: microphone → STT → LLM → TTS → speaker.
int rcli_start_listening(RCLIHandle handle);
handle
RCLIHandle
required
Engine handle (must be initialized)
return
int
  • 0: Successfully started
  • -1: Failed (engine not ready or already listening)

Example

if (rcli_start_listening(handle) == 0) {
    printf("Listening... Speak now.\n");
} else {
    fprintf(stderr, "Failed to start listening\n");
}
The pipeline runs asynchronously on background threads. Use callbacks to receive transcript updates and state changes.

rcli_stop_listening

Stop the live voice pipeline.
int rcli_stop_listening(RCLIHandle handle);
handle
RCLIHandle
required
Engine handle
return
int
Always returns 0

Example

// Start listening
rcli_start_listening(handle);

// ... user speaks ...

// Stop after 30 seconds
sleep(30);
rcli_stop_listening(handle);

Live Mode Pipeline Flow

1. Mic input captured at 16kHz
2. VAD detects speech start/end
3. Zipformer streaming STT transcribes in real-time
4. On speech endpoint:
   - Transcript sent to LLM
   - LLM generates response (with optional tool calls)
   - Response sent to TTS
5. TTS audio plays through speaker
6. Loop continues until rcli_stop_listening()

Push-to-Talk Mode

Capture audio on demand, then transcribe with high-accuracy offline Whisper.

rcli_start_capture

Start microphone capture without STT streaming.
int rcli_start_capture(RCLIHandle handle);
handle
RCLIHandle
required
Engine handle (must be initialized)
return
int
  • 0: Capture started
  • -1: Failed
Call this before the user starts speaking to avoid clipping the start of their audio.

rcli_stop_capture_and_transcribe

Stop capture and transcribe the recorded audio.
const char* rcli_stop_capture_and_transcribe(RCLIHandle handle);
handle
RCLIHandle
required
Engine handle
return
const char*
Transcript text. Empty string if transcription failed. Do not free - owned by the engine.

Example: Push-to-Talk

// User presses button
rcli_start_capture(handle);
printf("Recording... Release to transcribe.\n");

// User speaks while holding button
// ...

// User releases button
const char* transcript = rcli_stop_capture_and_transcribe(handle);
printf("You said: %s\n", transcript);

// Now process the command
const char* response = rcli_process_command(handle, transcript);
rcli_speak(handle, response);
Transcription uses offline Whisper and can take 100-500ms depending on audio length.

rcli_get_transcript

Get the last transcript from STT.
const char* rcli_get_transcript(RCLIHandle handle);
handle
RCLIHandle
required
Engine handle
return
const char*
Last transcript text. Empty if none available. Do not free.

Text Command Processing

rcli_process_command

Process a text command: skip STT, go directly to LLM → tool execution → response.
const char* rcli_process_command(RCLIHandle handle, const char* text);
handle
RCLIHandle
required
Engine handle (must be initialized)
text
const char*
required
User command text
return
const char*
LLM response text. Empty string on error. Do not free - owned by the engine.

Example: Text-Only Mode

const char* response = rcli_process_command(handle, "What's the weather?");
printf("Assistant: %s\n", response);

Example: Tool Execution

// LLM detects tool call and executes it
const char* result = rcli_process_command(handle, "open Safari");
printf("%s\n", result);  // "Opened Safari"
This function blocks until the LLM completes generation. Use rcli_stop_processing() from another thread to cancel.

Conversation History

The engine automatically maintains conversation context:
rcli_process_command(handle, "My name is Alice");
// "Nice to meet you, Alice!"

rcli_process_command(handle, "What's my name?");
// "Your name is Alice."

// Clear history to start fresh
rcli_clear_history(handle);

rcli_process_command(handle, "What's my name?");
// "I don't know your name."

rcli_clear_history

Clear conversation history (start a fresh conversation).
void rcli_clear_history(RCLIHandle handle);
handle
RCLIHandle
required
Engine handle

Text-to-Speech

rcli_speak

Synthesize text to speech and play through speaker.
int rcli_speak(RCLIHandle handle, const char* text);
handle
RCLIHandle
required
Engine handle (must be initialized)
text
const char*
required
Text to speak. Markdown and special characters are automatically sanitized.
return
int
  • 0: Speech started successfully
  • -1: Failed (TTS error or invalid handle)

Example

rcli_speak(handle, "Hello, world!");

// Speak LLM response
const char* response = rcli_process_command(handle, "Tell me a joke");
rcli_speak(handle, response);
rcli_speak() is non-blocking - it starts playback and returns immediately. Use rcli_is_speaking() to check status.

rcli_stop_speaking

Stop TTS playback immediately (interrupt current speech).
void rcli_stop_speaking(RCLIHandle handle);
handle
RCLIHandle
required
Engine handle

Example

rcli_speak(handle, "This is a very long sentence that...");

// User interrupts
rcli_stop_speaking(handle);

rcli_is_speaking

Check if TTS audio is currently playing.
int rcli_is_speaking(RCLIHandle handle);
handle
RCLIHandle
required
Engine handle
return
int
  • 1: TTS is playing
  • 0: No active TTS playback

Example: Wait for Speech to Complete

rcli_speak(handle, "Processing your request...");

while (rcli_is_speaking(handle)) {
    usleep(100000);  // Poll every 100ms
}

printf("Speech finished\n");

Emergency Stop

rcli_stop_processing

Stop all ongoing processing: cancel LLM generation, stop TTS, stop STT.
void rcli_stop_processing(RCLIHandle handle);
handle
RCLIHandle
required
Engine handle
Safe to call from any thread. Non-blocking. Use this for “panic button” functionality.

Example: Interrupt Long Generation

// Thread 1: Start long operation
const char* result = rcli_process_command(handle, "Write a 1000 word essay");

// Thread 2: User cancels
rcli_stop_processing(handle);

Complete Example: Voice Assistant

#include "api/rcli_api.h"
#include <stdio.h>
#include <unistd.h>

void on_transcript(const char* text, int is_final, void* user_data) {
    if (is_final) {
        printf("User: %s\n", text);
    } else {
        printf("Partial: %s\r", text);  // Real-time updates
        fflush(stdout);
    }
}

void on_state(int old_state, int new_state, void* user_data) {
    const char* states[] = {"IDLE", "LISTENING", "PROCESSING", "SPEAKING", "INTERRUPTED"};
    printf("State: %s -> %s\n", states[old_state], states[new_state]);
}

int main() {
    RCLIHandle handle = rcli_create(NULL);
    rcli_init(handle, "/path/to/models", 99);

    // Register callbacks
    rcli_set_transcript_callback(handle, on_transcript, NULL);
    rcli_set_state_callback(handle, on_state, NULL);

    // Start voice pipeline
    printf("Starting voice assistant...\n");
    rcli_start_listening(handle);

    // Run for 60 seconds
    sleep(60);

    // Stop
    rcli_stop_listening(handle);
    rcli_destroy(handle);
    return 0;
}

See Also

Build docs developers (and LLMs) love