Skip to main content

UAC Speaker Output Example

This example demonstrates how to output audio to a USB speaker or USB audio device using the USB Audio Class (UAC) protocol. It shows audio format configuration, writing audio data, and controlling speaker settings.

What This Example Demonstrates

  • Initializing USB audio output (speaker)
  • Configuring audio format (sample rate, bit depth, channels)
  • Writing audio data to the speaker
  • Controlling speaker volume and mute
  • Managing audio buffers for continuous playback

Hardware Setup

Required Components:
  • ESP32-S2 or ESP32-S3 development board
  • USB speaker or USB audio interface (UAC-compatible)
  • USB OTG cable or adapter
Connections:
  • Connect USB speaker to ESP32’s USB port
  • Connect ESP32 to computer via serial for monitoring
  • Ensure adequate power supply (some speakers need external power)

Complete Code

#include <Arduino.h>
#include "USB_STREAM.h"

// Audio generation variables
const int SAMPLE_RATE = 16000;
const int FREQUENCY = 440;  // A4 note
float phase = 0.0;

/* Define the speaker frame callback function implementation */
static void onSpeakerFrameCallback(spk_frame_t *frame, void *ptr)
{
    // This callback is called when the speaker needs more data
    Serial.printf("spk callback! bit_resolution = %u, samples_frequence = %"PRIu32", data_bytes = %"PRIu32"\n", 
                  frame->bit_resolution, frame->samples_frequence, frame->data_bytes);
    
    // Generate audio data (example: sine wave)
    int16_t *samples = (int16_t *)frame->data;
    int numSamples = frame->data_bytes / 2;  // 16-bit = 2 bytes per sample
    
    for (int i = 0; i < numSamples; i++) {
        // Generate sine wave at 440 Hz
        samples[i] = (int16_t)(sin(phase) * 10000);  // Amplitude: 10000
        phase += 2.0 * PI * FREQUENCY / SAMPLE_RATE;
        if (phase >= 2.0 * PI) phase -= 2.0 * PI;
    }
}

void setup()
{
    Serial.begin(115200);
    
    // Instantiate a USB_STREAM object
    USB_STREAM *usb = new USB_STREAM();

    // Config the parameters
    // Mic input: any format, 6400 bytes buffer
    // Speaker output: any format, 6400 bytes buffer
    usb->uacConfiguration(
        UAC_CH_ANY, UAC_BITS_ANY, UAC_FREQUENCY_ANY, 6400,  // Mic (not used)
        UAC_CH_ANY, UAC_BITS_ANY, UAC_FREQUENCY_ANY, 6400   // Speaker
    );

    // Register the speaker callback function
    usb->uacSpkRegisterCb(&onSpeakerFrameCallback, NULL);

    // Start USB streaming
    usb->start();

    // Wait for device connection (up to 1000ms)
    usb->connectWait(1000);
    delay(5000);

    // Demonstrate volume control
    usb->uacSpkVolume((void *)80);  // Set volume to 80%
    delay(3000);

    // Demonstrate mute
    usb->uacSpkMute((void *)1);     // Mute speaker
    delay(2000);
    
    usb->uacSpkMute((void *)0);     // Unmute speaker
    delay(3000);

    // Demonstrate suspend/resume
    usb->uacSpkSuspend(NULL);       // Pause playback
    delay(2000);
    
    usb->uacSpkResume(NULL);        // Resume playback
}

void loop()
{
    vTaskDelay(100);
}

Code Explanation

1. Speaker Callback Function

static void onSpeakerFrameCallback(spk_frame_t *frame, void *ptr)
{
    Serial.printf("spk callback! bit_resolution = %u, samples_frequence = %"PRIu32", data_bytes = %"PRIu32"\n", 
                  frame->bit_resolution, frame->samples_frequence, frame->data_bytes);
}
Line-by-line breakdown:
  • static void onSpeakerFrameCallback(spk_frame_t *frame, void *ptr) - Called when speaker needs audio data
  • frame->bit_resolution - Audio bit depth (8, 16, 24, or 32 bits)
  • frame->samples_frequence - Sample rate in Hz (e.g., 16000, 44100, 48000)
  • frame->data_bytes - Number of bytes the buffer can hold
  • frame->data - Pointer where you write audio samples
Key Difference from Microphone: For speakers, you WRITE to frame->data. For microphones, you READ from it.

2. Generating Audio Data

int16_t *samples = (int16_t *)frame->data;
int numSamples = frame->data_bytes / 2;  // 16-bit = 2 bytes per sample

for (int i = 0; i < numSamples; i++) {
    samples[i] = (int16_t)(sin(phase) * 10000);  // Amplitude: 10000
    phase += 2.0 * PI * FREQUENCY / SAMPLE_RATE;
    if (phase >= 2.0 * PI) phase -= 2.0 * PI;
}
Sine wave generation explained:
  • (int16_t *)frame->data - Cast buffer to 16-bit signed integers
  • numSamples = data_bytes / 2 - Calculate how many samples to generate
  • sin(phase) * 10000 - Generate sine wave with amplitude 10000 (out of 32767 max)
  • phase += ... - Advance phase by one sample period at target frequency
  • if (phase >= 2π) - Wrap phase to prevent overflow
Volume control via amplitude:
  • Max 16-bit value: 32767
  • Amplitude 10000 ≈ 30% volume
  • Increase for louder sound (but avoid clipping above 32767)

3. UAC Configuration

usb->uacConfiguration(
    UAC_CH_ANY, UAC_BITS_ANY, UAC_FREQUENCY_ANY, 6400,  // Mic input
    UAC_CH_ANY, UAC_BITS_ANY, UAC_FREQUENCY_ANY, 6400   // Speaker output
);
Parameters for speaker (last 4):
  1. UAC_CH_ANY - Accept any channel count (mono/stereo)
  2. UAC_BITS_ANY - Accept any bit depth (typically 16-bit)
  3. UAC_FREQUENCY_ANY - Accept any sample rate (negotiated with device)
  4. 6400 - Output buffer size in bytes
Buffer size considerations:
  • 6400 bytes = 3200 samples (16-bit)
  • At 16kHz: 3200 samples = 200ms of audio
  • At 48kHz: 6400 bytes = ~66ms of audio
  • Larger buffers = more latency but smoother playback

4. Callback Registration

usb->uacSpkRegisterCb(&onSpeakerFrameCallback, NULL);
  • &onSpeakerFrameCallback - Function pointer to your audio generator
  • NULL - Optional user data (can pass state variables)
Passing user data example:
struct AudioState {
    float frequency;
    float volume;
};

AudioState state = {440.0, 0.5};
usb->uacSpkRegisterCb(&onSpeakerFrameCallback, &state);

// In callback:
static void onSpeakerFrameCallback(spk_frame_t *frame, void *ptr) {
    AudioState *state = (AudioState *)ptr;
    // Use state->frequency and state->volume
}

5. Volume and Mute Control

usb->uacSpkVolume((void *)80);   // Set volume to 80%
usb->uacSpkMute((void *)1);      // Mute: 1 = muted
usb->uacSpkMute((void *)0);      // Unmute: 0 = unmuted
Volume control:
  • Range: 0-100 (percentage)
  • Applied by USB audio device hardware
  • Independent of amplitude in your generated audio
  • Not all USB speakers support this command
Mute control:
  • 1 = muted (callback still runs, but output silenced)
  • 0 = unmuted (normal audio output)
  • Useful for temporary silence without stopping generation

6. Suspend and Resume

usb->uacSpkSuspend(NULL);    // Stop audio playback
delay(2000);                 // Paused
usb->uacSpkResume(NULL);     // Resume audio playback
Use cases:
  • Power saving when audio not needed
  • Switching between audio sources
  • Temporary pause without reconfiguration
  • Callbacks stop during suspend, resume when resumed

Expected Serial Output

spk callback! bit_resolution = 16, samples_frequence = 16000, data_bytes = 640
spk callback! bit_resolution = 16, samples_frequence = 16000, data_bytes = 640
spk callback! bit_resolution = 16, samples_frequence = 16000, data_bytes = 640
...
You should hear: A continuous 440 Hz tone (musical note A4) from your USB speaker.

Audio Generation Examples

1. Play Tone at Specific Frequency

void generateTone(spk_frame_t *frame, float frequency, float amplitude) {
    static float phase = 0.0;
    int16_t *samples = (int16_t *)frame->data;
    int numSamples = frame->data_bytes / 2;
    
    for (int i = 0; i < numSamples; i++) {
        samples[i] = (int16_t)(sin(phase) * amplitude * 32767);
        phase += 2.0 * PI * frequency / frame->samples_frequence;
        if (phase >= 2.0 * PI) phase -= 2.0 * PI;
    }
}

// Usage in callback:
onSpeakerFrameCallback(spk_frame_t *frame, void *ptr) {
    generateTone(frame, 440.0, 0.3);  // 440 Hz at 30% amplitude
}

2. Play Audio from SD Card

#include <SD.h>

File audioFile;

static void onSpeakerFrameCallback(spk_frame_t *frame, void *ptr)
{
    if (audioFile && audioFile.available()) {
        // Read audio data from file
        audioFile.read(frame->data, frame->data_bytes);
    } else {
        // End of file - fill with silence
        memset(frame->data, 0, frame->data_bytes);
    }
}

void playAudioFile(const char *filename) {
    audioFile = SD.open(filename, FILE_READ);
    // Skip WAV header (44 bytes) if needed
    audioFile.seek(44);
}

3. Play Audio from Memory (Flash)

// Store audio data in flash memory
const int16_t audioData[] PROGMEM = {
    0, 1000, 2000, 3000, 4000, 5000, /* ... */
};
const int audioDataSize = sizeof(audioData) / sizeof(int16_t);
int audioPosition = 0;

static void onSpeakerFrameCallback(spk_frame_t *frame, void *ptr)
{
    int16_t *samples = (int16_t *)frame->data;
    int numSamples = frame->data_bytes / 2;
    
    for (int i = 0; i < numSamples; i++) {
        samples[i] = pgm_read_word(&audioData[audioPosition]);
        audioPosition++;
        if (audioPosition >= audioDataSize) {
            audioPosition = 0;  // Loop
        }
    }
}

4. Generate White Noise

static void onSpeakerFrameCallback(spk_frame_t *frame, void *ptr)
{
    int16_t *samples = (int16_t *)frame->data;
    int numSamples = frame->data_bytes / 2;
    
    for (int i = 0; i < numSamples; i++) {
        // Random value between -32767 and 32767
        samples[i] = (int16_t)(random(-32767, 32767));
    }
}

5. Play Multiple Tones (Chord)

static void onSpeakerFrameCallback(spk_frame_t *frame, void *ptr)
{
    static float phase1 = 0.0, phase2 = 0.0, phase3 = 0.0;
    int16_t *samples = (int16_t *)frame->data;
    int numSamples = frame->data_bytes / 2;
    float sampleRate = frame->samples_frequence;
    
    for (int i = 0; i < numSamples; i++) {
        // A major chord: A (440 Hz) + C# (554 Hz) + E (659 Hz)
        float sample = sin(phase1) + sin(phase2) + sin(phase3);
        samples[i] = (int16_t)(sample * 10000);  // Scale down (3 tones)
        
        phase1 += 2.0 * PI * 440 / sampleRate;  // A
        phase2 += 2.0 * PI * 554 / sampleRate;  // C#
        phase3 += 2.0 * PI * 659 / sampleRate;  // E
        
        if (phase1 >= 2.0 * PI) phase1 -= 2.0 * PI;
        if (phase2 >= 2.0 * PI) phase2 -= 2.0 * PI;
        if (phase3 >= 2.0 * PI) phase3 -= 2.0 * PI;
    }
}

6. Receive Audio Over WiFi and Play

#include <WiFi.h>
#include <AsyncUDP.h>

AsyncUDP udp;
QueueHandle_t audioQueue;

void setupWiFiAudio() {
    audioQueue = xQueueCreate(10, sizeof(uint8_t*));
    
    udp.listen(5000);
    udp.onPacket([](AsyncUDPPacket packet) {
        uint8_t *buffer = (uint8_t *)malloc(packet.length());
        memcpy(buffer, packet.data(), packet.length());
        xQueueSend(audioQueue, &buffer, 0);
    });
}

static void onSpeakerFrameCallback(spk_frame_t *frame, void *ptr)
{
    uint8_t *buffer;
    if (xQueueReceive(audioQueue, &buffer, 0) == pdTRUE) {
        memcpy(frame->data, buffer, frame->data_bytes);
        free(buffer);
    } else {
        // No data available - fill with silence
        memset(frame->data, 0, frame->data_bytes);
    }
}

Stereo Audio Output

For stereo speakers, interleave left and right channels:
static void onSpeakerFrameCallback(spk_frame_t *frame, void *ptr)
{
    int16_t *samples = (int16_t *)frame->data;
    int numSamples = frame->data_bytes / 2;
    
    static float phaseL = 0.0, phaseR = 0.0;
    
    for (int i = 0; i < numSamples; i += 2) {
        // Left channel - 440 Hz
        samples[i] = (int16_t)(sin(phaseL) * 10000);
        phaseL += 2.0 * PI * 440 / frame->samples_frequence;
        
        // Right channel - 880 Hz
        samples[i + 1] = (int16_t)(sin(phaseR) * 10000);
        phaseR += 2.0 * PI * 880 / frame->samples_frequence;
        
        if (phaseL >= 2.0 * PI) phaseL -= 2.0 * PI;
        if (phaseR >= 2.0 * PI) phaseR -= 2.0 * PI;
    }
}

Performance Considerations

Callback Timing:
  • Callback must complete before buffer deadline
  • Keep processing lightweight
  • Pre-compute data when possible
  • Avoid memory allocation in callback
Buffer Underruns:
  • Occur when callback doesn’t fill buffer in time
  • Causes audio glitches/clicks
  • Solution: Increase buffer size or simplify callback
Memory Usage:
  • Don’t allocate memory in callback
  • Use ring buffers for streaming data
  • Pre-allocate all buffers in setup()

Troubleshooting

No sound output:
  • Check USB speaker connection
  • Verify speaker is UAC-compatible
  • Check speaker power and volume knob
  • Ensure callback is writing data to frame->data
  • Try different USB cable/port
Distorted or glitchy audio:
  • Reduce amplitude (avoid clipping above 32767)
  • Increase buffer size
  • Simplify callback processing
  • Check for buffer underruns
Callback not being called:
  • Verify uacSpkRegisterCb() was called
  • Check connectWait() succeeded
  • Ensure stream not suspended
  • Try restarting ESP32
Volume control not working:
  • Not all USB speakers support UAC volume control
  • Some speakers have hardware volume only
  • Try controlling amplitude in code instead

UAC Microphone

Capture audio from USB microphones

Combined Streaming

Use speaker, microphone, and camera together

Build docs developers (and LLMs) love