Skip to main content

Overview

The Voice Agent SDK provides robust interruption handling to support natural conversational flow. Users can interrupt the agent at any time (“barge-in”), and the agent immediately cancels in-flight work to save tokens and reduce latency.

Interruption Types

The SDK supports two levels of interruption:

1. Speech-Only Interruption

Cancels ongoing speech generation and playback, but allows the LLM stream to continue.
agent.interruptSpeech('user_speaking');
When to use:
  • User starts speaking while agent is playing audio
  • You want to stop audio but let the LLM finish generating text (e.g., for display purposes)
What happens:
  • Aborts current TTS generation
  • Clears pending TTS chunks from the queue
  • Emits speech_interrupted event
  • LLM stream continues running

2. Full Interruption (Barge-In)

Cancels both the LLM stream and speech generation.
agent.interruptCurrentResponse('user_speaking');
When to use:
  • User starts speaking (true barge-in scenario)
  • You want to completely cancel the current response to process new input
  • Save tokens and API costs by stopping unnecessary generation
What happens:
  • Aborts the LLM stream using AbortController
  • Cancels all speech generation
  • Clears pending TTS chunks
  • Emits speech_interrupted event
  • Agent immediately ready for new input

Implementation Details

AbortController for LLM Streams

The agent uses AbortController to cancel in-flight LLM streams:
// From VoiceAgent.new.ts:80-81, 193-199
private currentStreamAbortController?: AbortController;

public interruptCurrentResponse(reason: string = 'interrupted'): void {
  if (this.currentStreamAbortController) {
    this.currentStreamAbortController.abort();
    this.currentStreamAbortController = undefined;
  }
  this.speech.interruptSpeech(reason);
}
The abort signal is passed to streamText():
// From VoiceAgent.new.ts:386-400
this.currentStreamAbortController = new AbortController();
const streamAbortSignal = this.currentStreamAbortController.signal;

const result = streamText({
  model: this.model,
  system: this.instructions,
  messages: this.conversation.getHistoryRef(),
  tools: this.tools,
  stopWhen: this.stopWhen,
  abortSignal: streamAbortSignal,  // ← Enables cancellation
  // ...
});

Speech Queue Clearing

The SpeechManager clears all pending chunks:
public interruptSpeech(reason: string): void {
  this.reset();  // Clears queue and pending state
  this.emit('speech_interrupted', { reason });
}

Automatic Barge-In

Server-Side (VoiceAgent)

The agent automatically interrupts when receiving user input:
// From VoiceAgent.new.ts:326-332
if (message.type === 'transcript') {
  // User sent a transcript
  this.interruptCurrentResponse('user_speaking');
  await this.enqueueInput(message.text);
} else if (message.type === 'audio') {
  // User sent audio
  this.interruptCurrentResponse('user_speaking');
  await this.handleAudioInput(message.data, message.format);
}

Client-Side (Browser)

The browser client detects speech and interrupts automatically:
// From voice-client.html:419-426
function autoBargeIn() {
  if (!isAssistantSpeaking()) return;
  
  stopAudioPlayback();
  
  if (ws && connected) {
    ws.send(JSON.stringify({ type: 'interrupt', reason: 'user_speaking' }));
    console.log('Auto-interrupt: user started speaking');
  }
}
This is triggered in both STT modes: Browser STT:
recognition.onresult = (event) => {
  for (let i = event.resultIndex; i < event.results.length; i++) {
    const text = event.results[i][0].transcript.trim();
    if (!text) continue;
    
    if (event.results[i].isFinal) {
      autoBargeIn();  // ← On final transcript
      // Send to server...
    } else {
      autoBargeIn();  // ← Even on interim results
      // Show interim text...
    }
  }
};
Server Whisper with VAD:
function vadCheck() {
  // ... VAD logic ...
  
  if (isSpeech && !whisperSegmentActive) {
    vadSpeechFrames += 1;
    if (vadSpeechFrames >= VAD_SPEECH_START_FRAMES) {
      autoBargeIn();  // ← When speech detected
      beginWhisperSegment();
    }
  }
}

Manual Interruption

From Application Code

import { VoiceAgent } from 'voice-agent-ai-sdk';

const agent = new VoiceAgent({ /* ... */ });

// User clicks "Stop" button
stopButton.on('click', () => {
  agent.interruptCurrentResponse('user_clicked_stop');
});

// Timeout scenario
setTimeout(() => {
  if (agent.speaking) {
    agent.interruptSpeech('timeout');
  }
}, 30000);

// Priority message
if (urgentMessageReceived) {
  agent.interruptCurrentResponse('priority_message');
  await agent.sendText(urgentMessage);
}

From Browser Client

Send an interrupt message via WebSocket:
interruptBtn.addEventListener('click', () => {
  if (!ws || !connected) return;
  
  stopAudioPlayback();
  
  ws.send(JSON.stringify({
    type: 'interrupt',
    reason: 'user_clicked_interrupt'
  }));
  
  console.log('Sent interrupt request');
});
The server handles this message:
// From VoiceAgent.new.ts:343-348
else if (message.type === 'interrupt') {
  console.log(
    `Received interrupt request: ${message.reason || 'client_request'}`
  );
  this.interruptCurrentResponse(message.reason || 'client_request');
}

Interruption Events

Listen for interruption events to update UI or log analytics:
agent.on('speech_interrupted', ({ reason }) => {
  console.log(`Speech interrupted: ${reason}`);
  
  // Update UI
  updateStatus('Interrupted');
  clearSpeakingIndicator();
  
  // Analytics
  trackEvent('speech_interrupted', { reason });
});
Common interruption reasons:
  • user_speaking — User started speaking (barge-in)
  • user_clicked_stop — User clicked stop/interrupt button
  • client_request — Generic client-side interruption
  • timeout — Response took too long
  • priority_message — Higher priority input received
  • interrupted — Default reason if none provided

Cleanup on Disconnect

The agent automatically cleans up when disconnecting:
// From VoiceAgent.new.ts:466-474
private cleanupOnDisconnect(): void {
  if (this.currentStreamAbortController) {
    this.currentStreamAbortController.abort();
    this.currentStreamAbortController = undefined;
  }
  this.speech.reset();
  this._isProcessing = false;
  this.inputQueue.rejectAll(new Error('Connection closed'));
}
This ensures:
  • LLM stream is cancelled
  • Speech queue is cleared
  • Pending inputs are rejected
  • No lingering resources

Interruption in Multi-Step Tool Execution

When tools are executing, interruption cancels the entire operation:
1

LLM calls first tool

Agent begins multi-step workflow with tool call.
2

User interrupts

interruptCurrentResponse() is called while tool is executing.
3

Stream aborts

AbortController cancels the stream, preventing subsequent tool calls.
4

Current tool completes

The tool already executing may complete, but results are discarded.
5

Agent ready

Agent immediately ready to process new input.
Interrupting mid-tool-execution may leave tools in an incomplete state. Design tools to be idempotent or handle partial execution gracefully.

Best Practices

When a user starts speaking, call interruptCurrentResponse() to cancel both LLM and speech. This saves tokens and provides the fastest response to new input.
agent.on('audio_received', () => {
  agent.interruptCurrentResponse('user_speaking');
});
Show users when interruption occurs:
agent.on('speech_interrupted', ({ reason }) => {
  showToast(`Interrupted: ${reason}`);
  updateStatusIndicator('ready');
});
Use descriptive reasons for interruption to help with debugging and analytics:
agent.interruptCurrentResponse('user_started_typing');
agent.interruptCurrentResponse('urgent_notification');
agent.interruptCurrentResponse('call_incoming');
If your tools do external API calls or long operations, check for cancellation:
const searchTool = tool({
  description: 'Search for information',
  inputSchema: z.object({ query: z.string() }),
  execute: async ({ query }, { abortSignal }) => {
    const response = await fetch(url, { signal: abortSignal });
    return await response.json();
  }
});
Measure time from user speech to interruption:
let bargeInStart = 0;

agent.on('audio_received', () => {
  bargeInStart = Date.now();
});

agent.on('speech_interrupted', () => {
  const latency = Date.now() - bargeInStart;
  console.log(`Barge-in latency: ${latency}ms`);
});

Common Patterns

Debounced Interruption

Prevent accidental interruptions from brief pauses:
let interruptDebounce = null;

function debouncedInterrupt(reason: string, delay: number = 500) {
  if (interruptDebounce) clearTimeout(interruptDebounce);
  
  interruptDebounce = setTimeout(() => {
    agent.interruptCurrentResponse(reason);
    interruptDebounce = null;
  }, delay);
}

// Cancel debounce if user stops speaking quickly
function cancelInterrupt() {
  if (interruptDebounce) {
    clearTimeout(interruptDebounce);
    interruptDebounce = null;
  }
}

Conditional Interruption

Only interrupt in certain states:
function conditionalInterrupt(reason: string) {
  // Don't interrupt during critical announcements
  if (currentMessagePriority === 'critical') {
    console.log('Skipping interrupt for critical message');
    return;
  }
  
  // Don't interrupt very short responses (let them finish)
  if (agent.speaking && agent.pendingSpeechChunks < 2) {
    console.log('Letting short response finish');
    return;
  }
  
  agent.interruptCurrentResponse(reason);
}

Interrupt and Resume

Save state before interruption for potential resumption:
let interruptedResponse: string | null = null;

agent.on('text', ({ role, text }) => {
  if (role === 'assistant') {
    interruptedResponse = text;
  }
});

agent.on('speech_interrupted', ({ reason }) => {
  if (reason === 'user_speaking' && interruptedResponse) {
    console.log('Response was interrupted:', interruptedResponse);
    // Optionally offer to continue later
  }
});

Troubleshooting

Check:
  • Are you calling interruptCurrentResponse() instead of just interruptSpeech()?
  • Is there network latency between client and server?
  • Are you using WebSocket for real-time communication?
  • Is the browser’s audio queue too long?
Ensure you’re using interruptCurrentResponse() (not interruptSpeech()). Verify the AbortController is being properly set and aborted.
The currently executing tool may complete even after abort signal. This is expected. Design tools to be cancellable if needed.
Check that:
  • speechModel is configured
  • Speech generation is actually active (agent.speaking === true)
  • You’re calling interruptSpeech() or interruptCurrentResponse()

Next Steps

Browser Client

See interruption handling in the complete browser implementation

History Management

Learn how to manage conversation state across interruptions

Build docs developers (and LLMs) love