Skip to main content

UAC Microphone Capture Example

This example demonstrates how to capture audio from a USB microphone using the USB Audio Class (UAC) protocol. It shows configuration of audio format, receiving audio data via callbacks, and controlling microphone settings.

What This Example Demonstrates

  • Initializing USB audio capture (microphone)
  • Configuring audio format (sample rate, bit depth, channels)
  • Receiving audio data through callbacks
  • Controlling microphone volume and mute
  • Suspending and resuming audio capture

Hardware Setup

Required Components:
  • ESP32-S2 or ESP32-S3 development board
  • USB microphone (UAC-compatible)
  • USB OTG cable or adapter
Connections:
  • Connect USB microphone to ESP32’s USB port
  • Connect ESP32 to computer via serial for monitoring
  • Ensure adequate power supply

Complete Code

#include <Arduino.h>
#include "USB_STREAM.h"

/* Define the Mic frame callback function implementation */
static void onMicFrameCallback(mic_frame_t *frame, void *ptr)
{
    // We should using higher baudrate here, to reduce the blocking time here
    Serial.printf("mic callback! bit_resolution = %u, samples_frequence = %"PRIu32", data_bytes = %"PRIu32"\n", frame->bit_resolution, frame->samples_frequence, frame->data_bytes);
}

void setup()
{
    Serial.begin(115200);
    // Instantiate a Ustream object
    USB_STREAM *usb = new USB_STREAM();

    // Config the parameter
    usb->uacConfiguration(UAC_CH_ANY, UAC_BITS_ANY, UAC_FREQUENCY_ANY, 6400, UAC_CH_ANY, UAC_BITS_ANY, UAC_FREQUENCY_ANY, 6400);

    //Register the camera frame callback function
    usb->uacMicRegisterCb(&onMicFrameCallback, NULL);

    usb->start();

    usb->connectWait(1000);
    delay(5000);

    usb->uacMicMute((void *)0);
    delay(5000);

    usb->uacMicVolume((void *)60);

    usb->uacMicSuspend(NULL);
    delay(5000);

    usb->uacMicResume(NULL);

}

// The loop function runs repeatedly
void loop()
{
    // Delay the task for 100ms
    vTaskDelay(5000);
}

Code Explanation

1. Microphone Callback Function

static void onMicFrameCallback(mic_frame_t *frame, void *ptr)
{
    Serial.printf("mic callback! bit_resolution = %u, samples_frequence = %"PRIu32", data_bytes = %"PRIu32"\n", 
                  frame->bit_resolution, frame->samples_frequence, frame->data_bytes);
}
Line-by-line breakdown:
  • static void onMicFrameCallback(mic_frame_t *frame, void *ptr) - Called when audio data is available
  • frame->bit_resolution - Audio bit depth (8, 16, 24, or 32 bits)
  • frame->samples_frequence - Sample rate in Hz (e.g., 16000, 44100, 48000)
  • frame->data_bytes - Number of bytes in this audio buffer
  • frame->data - Pointer to actual audio samples (not shown in print, but available for processing)
Important Note: The comment warns about baudrate - printing can block audio processing. Use 115200 or higher, or minimize serial output in production.

2. UAC Configuration

usb->uacConfiguration(UAC_CH_ANY, UAC_BITS_ANY, UAC_FREQUENCY_ANY, 6400, 
                      UAC_CH_ANY, UAC_BITS_ANY, UAC_FREQUENCY_ANY, 6400);
Parameters explained (8 total): Microphone input (first 4 parameters):
  1. UAC_CH_ANY - Accept any channel count (mono/stereo)
  2. UAC_BITS_ANY - Accept any bit depth (8/16/24/32-bit)
  3. UAC_FREQUENCY_ANY - Accept any sample rate
  4. 6400 - Input buffer size in bytes (6.4KB)
Speaker output (last 4 parameters): 5. UAC_CH_ANY - Speaker channel configuration 6. UAC_BITS_ANY - Speaker bit depth 7. UAC_FREQUENCY_ANY - Speaker sample rate 8. 6400 - Output buffer size in bytes Why configure both? The library supports simultaneous microphone and speaker operation. Even if only using microphone, both sets of parameters must be provided.

3. Callback Registration

usb->uacMicRegisterCb(&onMicFrameCallback, NULL);
  • &onMicFrameCallback - Function pointer to your callback
  • NULL - Optional user data (accessible as ptr in callback)
Custom user data example:
int frameCounter = 0;
usb->uacMicRegisterCb(&onMicFrameCallback, &frameCounter);

// In callback:
static void onMicFrameCallback(mic_frame_t *frame, void *ptr) {
    int *counter = (int *)ptr;
    (*counter)++;
}

4. Stream Initialization

usb->start();                // Initialize USB host
usb->connectWait(1000);      // Wait up to 1000ms for device
delay(5000);                 // Capture for 5 seconds
  • start() begins USB host operation
  • connectWait() blocks until microphone connects or timeout
  • After connection, callbacks begin firing automatically

5. Volume and Mute Control

usb->uacMicMute((void *)0);      // Mute: 0 = unmute, 1 = mute
delay(5000);                     // Wait 5 seconds (muted)
usb->uacMicVolume((void *)60);   // Set volume to 60%
Volume control:
  • Range: 0-100 (percentage)
  • Value is cast to (void *) pointer
  • Some microphones may not support volume control
Mute control:
  • 0 = unmuted (audio flows)
  • 1 = muted (audio stopped, but still streaming)

6. Suspend and Resume

usb->uacMicSuspend(NULL);    // Stop audio capture
delay(5000);                 // Paused for 5 seconds
usb->uacMicResume(NULL);     // Resume audio capture
Use cases for suspend/resume:
  • Power saving when audio not needed
  • Temporarily pause recording
  • Switch between different audio sources
  • No need to reconfigure when resuming

Expected Serial Output

mic callback! bit_resolution = 16, samples_frequence = 16000, data_bytes = 640
mic callback! bit_resolution = 16, samples_frequence = 16000, data_bytes = 640
mic callback! bit_resolution = 16, samples_frequence = 16000, data_bytes = 640
mic callback! bit_resolution = 16, samples_frequence = 16000, data_bytes = 640
...
Output analysis:
  • bit_resolution = 16 - 16-bit audio (CD quality)
  • samples_frequence = 16000 - 16 kHz sample rate (common for voice)
  • data_bytes = 640 - 640 bytes per callback
Calculating audio duration per callback:
  • 16-bit = 2 bytes per sample
  • Mono (1 channel)
  • 640 bytes ÷ 2 bytes/sample = 320 samples
  • 320 samples ÷ 16000 Hz = 20ms of audio per callback

Audio Format Negotiation

The UAC_*_ANY constants auto-negotiate with the microphone. You can also specify exact formats:

Request Specific Format

// Request 48kHz, 16-bit, stereo
usb->uacConfiguration(
    2,               // 2 channels (stereo)
    16,              // 16-bit
    48000,           // 48 kHz
    9600,            // Larger buffer for higher sample rate
    UAC_CH_ANY,      // Speaker: any
    UAC_BITS_ANY,
    UAC_FREQUENCY_ANY,
    6400
);

Common Configurations

Use CaseChannelsBitsSample RateBuffer Size
Voice recording1 (mono)1616000 Hz6400 bytes
Music recording2 (stereo)1644100 Hz12800 bytes
High quality2 (stereo)2448000 Hz19200 bytes
Low bandwidth1 (mono)88000 Hz3200 bytes

Processing Audio Data

Save Audio to SD Card (WAV Format)

#include <SD.h>

File audioFile;
bool recording = false;

static void onMicFrameCallback(mic_frame_t *frame, void *ptr)
{
    if (recording && audioFile) {
        audioFile.write(frame->data, frame->data_bytes);
    }
}

void startRecording() {
    audioFile = SD.open("/recording.raw", FILE_WRITE);
    recording = true;
}

void stopRecording() {
    recording = false;
    audioFile.close();
}

Calculate Audio Level (VU Meter)

static void onMicFrameCallback(mic_frame_t *frame, void *ptr)
{
    int16_t *samples = (int16_t *)frame->data;
    int numSamples = frame->data_bytes / 2;  // 16-bit = 2 bytes
    
    // Calculate RMS (root mean square) for audio level
    long sum = 0;
    for (int i = 0; i < numSamples; i++) {
        sum += samples[i] * samples[i];
    }
    int rms = sqrt(sum / numSamples);
    
    Serial.printf("Audio level: %d\n", rms);
}

Stream Audio Over WiFi

#include <WiFi.h>
WiFiClient client;

static void onMicFrameCallback(mic_frame_t *frame, void *ptr)
{
    if (client.connected()) {
        client.write(frame->data, frame->data_bytes);
    }
}

Voice Activity Detection (VAD)

const int SILENCE_THRESHOLD = 500;  // Adjust based on testing

static void onMicFrameCallback(mic_frame_t *frame, void *ptr)
{
    int16_t *samples = (int16_t *)frame->data;
    int numSamples = frame->data_bytes / 2;
    
    // Find maximum amplitude
    int maxAmplitude = 0;
    for (int i = 0; i < numSamples; i++) {
        int amplitude = abs(samples[i]);
        if (amplitude > maxAmplitude) {
            maxAmplitude = amplitude;
        }
    }
    
    if (maxAmplitude > SILENCE_THRESHOLD) {
        Serial.println("Voice detected!");
    } else {
        Serial.println("Silence");
    }
}

Performance Considerations

Buffer Size Selection:
  • Larger buffers = less frequent callbacks, less CPU overhead
  • Smaller buffers = lower latency, more real-time responsiveness
  • 6400 bytes works well for 16kHz, 16-bit audio
  • Increase for higher sample rates
Callback Execution Time:
  • Keep callback code fast and minimal
  • Avoid heavy processing in callback
  • Use ring buffers to pass data to main loop for processing
  • Minimize serial output (increases latency)
Memory Management:
  • Audio data in callback is temporary
  • Copy data if you need to keep it
  • Use DMA or ringbuffers for efficiency

Troubleshooting

No callbacks received:
  • Check USB connection
  • Verify microphone is UAC-compatible
  • Increase connectWait() timeout
  • Check device power requirements
Audio is choppy or distorted:
  • Increase buffer size
  • Reduce callback processing time
  • Check for buffer overruns
  • Verify sample rate matches microphone capability
Volume/mute not working:
  • Not all USB microphones support these controls
  • Check microphone’s UAC feature support
  • Some microphones have hardware volume controls only
Data bytes seem wrong:
  • Remember: bytes = samples × channels × (bits ÷ 8)
  • Stereo doubles the data size
  • Higher sample rates increase data rate

UAC Speaker

Play audio through USB speakers

Combined Streaming

Use microphone, speaker, and camera together

Build docs developers (and LLMs) love