Skip to main content

Overview

UAC (USB Audio Class) is a standardized protocol for transmitting audio data over USB. The ESP32_USB_STREAM library implements a UAC host driver that supports both microphone input (capture) and speaker output (playback) simultaneously.
The library can handle one microphone stream and one speaker stream concurrently, along with one UVC video stream.

Audio Streaming Architecture

The library manages two independent audio streams:

Microphone (Input) Streaming

Microphone streaming captures audio data from a USB microphone or audio input device.

Configuration

USB_STREAM *usb = new USB_STREAM();

// Configure UAC for microphone
usb->uacConfiguration(
    UAC_CH_ANY,           // mic_ch_num: any channel (mono/stereo)
    UAC_BITS_ANY,         // mic_bit_resolution: any bit depth
    UAC_FREQUENCY_ANY,    // mic_samples_frequency: any sample rate
    6400,                 // mic_buf_size: ring buffer size in bytes
    0,                    // spk_ch_num: not using speaker
    0,                    // spk_bit_resolution
    0,                    // spk_samples_frequency
    0                     // spk_buf_size
);

Audio Format Parameters

mic_ch_num
uint8_t
Number of audio channels:
  • 1 - Mono audio
  • 2 - Stereo audio
  • UAC_CH_ANY - Accept any channel configuration
mic_bit_resolution
uint16_t
Bit depth per sample:
  • 8 - 8-bit audio
  • 16 - 16-bit audio (CD quality)
  • 24 - 24-bit audio (high quality)
  • 32 - 32-bit audio
  • UAC_BITS_ANY - Accept any bit resolution
mic_samples_frequency
uint32_t
Sample rate in Hz:
  • 8000 - Phone quality
  • 16000 - Wideband speech
  • 44100 - CD quality
  • 48000 - Professional audio
  • 96000 - High-resolution audio
  • UAC_FREQUENCY_ANY - Accept any sample rate

Reading Microphone Data

There are two ways to receive microphone data:
static void onMicFrameCallback(mic_frame_t *frame, void *ptr)
{
    Serial.printf("Mic data received:\n");
    Serial.printf("  Channels: %u-bit\n", frame->bit_resolution);
    Serial.printf("  Sample rate: %u Hz\n", frame->samples_frequence);
    Serial.printf("  Data size: %u bytes\n", frame->data_bytes);
    
    // Access audio data
    uint8_t *audioData = (uint8_t *)frame->data;
    
    // Process audio (save to SD, stream over WiFi, etc.)
    // WARNING: Keep this callback fast!
}

// Register the callback
usb->uacMicRegisterCb(&onMicFrameCallback, NULL);

Method 2: Manual Read from Buffer

uint8_t audioBuffer[1024];
size_t bytesRead = 0;

// Read from mic buffer with 100ms timeout
usb->uacReadMic(audioBuffer, sizeof(audioBuffer), &bytesRead, 100);

if (bytesRead > 0) {
    Serial.printf("Read %u bytes from microphone\n", bytesRead);
    // Process the audio data
}

Microphone Frame Structure

The mic_frame_t structure contains:
FieldTypeDescription
datavoid*Pointer to audio sample data
data_bytesuint32_tSize of audio data in bytes
bit_resolutionuint16_tBits per sample (8, 16, 24, 32)
samples_frequenceuint32_tSample rate in Hz

Speaker (Output) Streaming

Speaker streaming sends audio data to a USB speaker or audio output device.

Configuration

// Configure UAC for speaker
usb->uacConfiguration(
    0,                    // mic_ch_num: not using mic
    0,                    // mic_bit_resolution
    0,                    // mic_samples_frequency
    0,                    // mic_buf_size
    UAC_CH_ANY,           // spk_ch_num: any channel
    UAC_BITS_ANY,         // spk_bit_resolution: any bit depth
    UAC_FREQUENCY_ANY,    // spk_samples_frequency: any sample rate
    6400                  // spk_buf_size: ring buffer size in bytes
);

Writing Speaker Data

// Prepare audio data (e.g., 16-bit stereo samples)
uint16_t audioSamples[512];

// Fill with audio data (sine wave example)
for (int i = 0; i < 512; i++) {
    audioSamples[i] = (uint16_t)(sin(i * 0.1) * 32767);
}

// Write to speaker buffer with 100ms timeout
usb->uacWriteSpk(audioSamples, sizeof(audioSamples), 100);
The speaker buffer must receive data continuously to avoid audio underruns. Make sure to feed data at the appropriate sample rate.

Concurrent Microphone and Speaker

You can use both microphone and speaker simultaneously:
// Configure both mic and speaker
usb->uacConfiguration(
    2,                    // mic: stereo
    16,                   // mic: 16-bit
    48000,                // mic: 48kHz
    6400,                 // mic buffer size
    2,                    // speaker: stereo
    16,                   // speaker: 16-bit
    48000,                // speaker: 48kHz
    6400                  // speaker buffer size
);
Using the same sample rate and bit resolution for both mic and speaker simplifies audio processing, especially for echo cancellation or audio passthrough applications.

Buffer Management

The library uses ring buffers to handle audio data flow:

Buffer Size Guidelines

// Calculate minimum buffer size
// Buffer Size = Sample Rate × Channels × (Bits/8) × Duration

// Example: 48kHz, stereo, 16-bit, 100ms
// Buffer = 48000 × 2 × 2 × 0.1 = 19,200 bytes

// Typical buffer sizes:
// - Low latency (50ms): 3200-4800 bytes
// - Balanced (100ms): 6400-9600 bytes  
// - High latency (200ms): 12800-19200 bytes

usb->uacConfiguration(
    2, 16, 48000, 9600,    // mic: 100ms buffer
    2, 16, 48000, 9600     // speaker: 100ms buffer
);
Pros:
  • Minimal audio delay
  • Lower memory usage
  • Better for real-time applications
Cons:
  • Higher risk of buffer underrun/overrun
  • More CPU interrupts
  • Less tolerance for processing delays
Pros:
  • Stable audio streaming
  • Tolerates processing variations
  • Fewer buffer errors
Cons:
  • Increased audio latency
  • Higher memory usage
  • Not suitable for real-time interaction

Volume and Mute Controls

Microphone Controls

// Mute microphone (0 = unmute, 1 = mute)
usb->uacMicMute((void *)1);

// Set microphone volume (0-100)
usb->uacMicVolume((void *)75);

Speaker Controls

// Mute speaker (0 = unmute, 1 = mute)
usb->uacSpkMute((void *)1);

// Set speaker volume (0-100)
usb->uacSpkVolume((void *)50);
Volume and mute control availability depends on the USB audio device. Not all devices support these features.

Suspend and Resume

You can independently control each audio stream:

Microphone Suspend/Resume

// Pause microphone input
usb->uacMicSuspend(NULL);

// Resume microphone input
usb->uacMicResume(NULL);

Speaker Suspend/Resume

// Pause speaker output
usb->uacSpkSuspend(NULL);

// Resume speaker output  
usb->uacSpkResume(NULL);

Dynamic Audio Configuration

You can change audio parameters while suspended:
// Suspend the stream
usb->uacMicSuspend(NULL);

// Change to different format
usb->uacMicFrameReset(
    1,      // mono
    16,     // 16-bit
    16000   // 16kHz (wideband speech)
);

// Resume with new settings
usb->uacMicResume(NULL);

Querying Audio Capabilities

Get Available Audio Formats

// Get microphone format list
size_t mic_list_size = 0;
size_t mic_current = 0;
usb->uacMicGetFrameListSize(&mic_list_size, &mic_current);

uac_frame_size_t *micFormats = new uac_frame_size_t[mic_list_size];
usb->uacMicGetFrameSize(micFormats);

for (size_t i = 0; i < mic_list_size; i++) {
    Serial.printf("Format %d:\n", i);
    Serial.printf("  Channels: %u\n", micFormats[i].ch_num);
    Serial.printf("  Bit resolution: %u\n", micFormats[i].bit_resolution);
    Serial.printf("  Sample rate: %u Hz\n", micFormats[i].samples_frequence);
    Serial.printf("  Rate range: %u - %u Hz\n",
                  micFormats[i].samples_frequence_min,
                  micFormats[i].samples_frequence_max);
}

delete[] micFormats;

Common Audio Formats

Phone Quality

8 kHz, Mono, 8-bit
  • Bandwidth: 8 KB/s
  • Use: Voice calls

Wideband Speech

16 kHz, Mono, 16-bit
  • Bandwidth: 32 KB/s
  • Use: VoIP, voice commands

CD Quality

44.1 kHz, Stereo, 16-bit
  • Bandwidth: 176 KB/s
  • Use: Music playback

Example: Complete UAC Setup

#include <Arduino.h>
#include "USB_STREAM.h"

static void onMicFrameCallback(mic_frame_t *frame, void *ptr)
{
    Serial.printf("Mic: %u-bit, %u Hz, %u bytes\n",
                  frame->bit_resolution,
                  frame->samples_frequence,
                  frame->data_bytes);
    
    // Echo audio to speaker (simple passthrough)
    USB_STREAM *usb = (USB_STREAM *)ptr;
    usb->uacWriteSpk((uint16_t *)frame->data, frame->data_bytes, 100);
}

void setup()
{
    Serial.begin(115200);
    
    USB_STREAM *usb = new USB_STREAM();
    
    // Configure for 16kHz, 16-bit, mono (low-latency voice)
    usb->uacConfiguration(
        1,      // mic: mono
        16,     // mic: 16-bit
        16000,  // mic: 16kHz
        3200,   // mic: 100ms buffer
        1,      // speaker: mono
        16,     // speaker: 16-bit
        16000,  // speaker: 16kHz
        3200    // speaker: 100ms buffer
    );
    
    // Register callback (pass usb pointer for echo)
    usb->uacMicRegisterCb(&onMicFrameCallback, usb);
    
    // Start streaming
    usb->start();
    usb->connectWait(1000);
    
    // Set initial volume
    usb->uacMicVolume((void *)80);
    usb->uacSpkVolume((void *)60);
    
    Serial.println("UAC streaming started!");
}

void loop()
{
    vTaskDelay(pdMS_TO_TICKS(100));
}

Best Practices

  • Size buffers appropriately for your latency requirements
  • Monitor buffer overflow/underflow conditions
  • Use larger buffers if experiencing audio glitches
  • Free buffers when streaming stops
  • Keep mic callbacks extremely fast
  • Avoid Serial.print() in production code
  • Use queues to pass data to other tasks
  • Never call UAC functions from within callbacks
  • Ensure continuous data flow to speaker at exact sample rate
  • Use FreeRTOS timers for precise timing if needed
  • Match mic and speaker rates for echo/passthrough
  • Consider sample rate conversion for format mismatches
  • Start with lower sample rates and increase if stable
  • Use 16-bit resolution as a good quality/performance balance
  • Monitor CPU usage during streaming
  • Test with different USB audio devices for compatibility

Troubleshooting

  • Check device is properly connected
  • Verify UAC configuration matches device capabilities
  • Ensure buffer size is sufficient
  • Check callback is registered before start()
  • Increase buffer size for more tolerance
  • Reduce callback processing time
  • Lower sample rate or reduce bit depth
  • Check for CPU or memory constraints
  • Not all devices support these features
  • Check device USB descriptor for feature unit
  • Try different control values
  • Some devices may have hardware-only controls

UVC Streaming

Learn about USB video streaming

Hardware Requirements

ESP32 SoC and USB wiring requirements

Build docs developers (and LLMs) love