Skip to main content

Overview

FFmpeg provides two distinct multithreading methods for codec operations, each optimized for different scenarios. Understanding these methods is crucial for achieving optimal performance in your applications.
Multithreading in FFmpeg can significantly improve performance on multi-core systems, but requires careful implementation to ensure thread safety.

Threading Methods

FFmpeg supports two primary threading approaches:

Slice Threading

Decodes multiple parts of a single frame simultaneously.

How It Works

Divides a frame into horizontal slices, with each slice decoded by a separate thread

Latency

Zero additional latency - frame completes when all slices finish

Parallelism

Limited by number of slices codec can divide frame into

Use Case

Best for low-latency applications and single-frame operations
Implementation: Uses AVCodecContext.execute() and execute2() callbacks to distribute slice processing across threads. Diagram:
Frame N:
┌─────────────────────────────┐
│   Slice 0  → Thread 0       │
├─────────────────────────────┤
│   Slice 1  → Thread 1       │
├─────────────────────────────┤
│   Slice 2  → Thread 2       │
├─────────────────────────────┤
│   Slice 3  → Thread 3       │
└─────────────────────────────┘

   Complete Frame N

Frame Threading

Decodes multiple frames simultaneously in parallel threads.

How It Works

Processes N frames concurrently, each in its own thread

Latency

Adds N-1 frames of delay, where N is the number of threads

Parallelism

Excellent scaling with number of CPU cores

Use Case

Best for high-throughput batch processing
Implementation: Accepts N future frames and delays decoded pictures by N-1 frames. While the user displays frame 0, frames 1 through N are being decoded in parallel. Diagram:
Time →

Thread 0: [Frame 0] [Frame 4] [Frame 8]  ...
Thread 1:    [Frame 1] [Frame 5] [Frame 9]  ...
Thread 2:       [Frame 2] [Frame 6] [Frame 10] ...
Thread 3:          [Frame 3] [Frame 7] [Frame 11] ...

Output:   [0]   [1]   [2]   [3]   [4]  ...
          ↑_____delay of 3 frames_____↑

Enabling Multithreading

Setting Thread Count

AVCodecContext *ctx = avcodec_alloc_context3(codec);

// Automatic thread count (recommended)
ctx->thread_count = 0;  // FFmpeg chooses optimal count

// Manual thread count
ctx->thread_count = 4;  // Use 4 threads

// Get actual thread count used
int threads = ctx->thread_count;

Selecting Threading Type

// Enable frame threading (preferred)
ctx->thread_type = FF_THREAD_FRAME;

// Enable slice threading
ctx->thread_type = FF_THREAD_SLICE;

// Let FFmpeg decide (automatic)
ctx->thread_type = FF_THREAD_FRAME | FF_THREAD_SLICE;

// Disable threading
ctx->thread_count = 1;
AVCodecContext *ctx = avcodec_alloc_context3(codec);
ctx->thread_count = 0;  // Auto
ctx->thread_type = FF_THREAD_FRAME;

if (avcodec_open2(ctx, codec, NULL) < 0) {
    fprintf(stderr, "Could not open codec\n");
    return -1;
}

Client Restrictions

Slice Threading Requirements

The client’s draw_horiz_band() callback must be thread-safe.
// Thread-safe draw_horiz_band implementation
static void draw_horiz_band_threadsafe(
    AVCodecContext *ctx,
    const AVFrame *frame,
    int offset[AV_NUM_DATA_POINTERS],
    int y, int type, int height
) {
    // Use mutex if accessing shared resources
    pthread_mutex_lock(&display_mutex);
    
    // Process the band
    display_band(frame, y, height);
    
    pthread_mutex_unlock(&display_mutex);
}

ctx->draw_horiz_band = draw_horiz_band_threadsafe;

Frame Threading Requirements

All restrictions from slice threading apply, plus additional requirements:
1

Thread-Safe Callbacks

Custom get_buffer2() and get_format() must be thread-safe
static int get_buffer2_threadsafe(
    AVCodecContext *ctx,
    AVFrame *frame,
    int flags
) {
    pthread_mutex_lock(&buffer_mutex);
    int ret = allocate_buffer(ctx, frame, flags);
    pthread_mutex_unlock(&buffer_mutex);
    return ret;
}

ctx->get_buffer2 = get_buffer2_threadsafe;
2

Handle Frame Delay

Account for N-1 frames of delay in timing
// pkt_dts and pkt_pts in AVFrame work normally
// but actual frame arrives N-1 frames later

while (av_read_frame(fmt_ctx, pkt) >= 0) {
    avcodec_send_packet(ctx, pkt);
    
    while (avcodec_receive_frame(ctx, frame) >= 0) {
        // Frame timing is correct despite delay
        int64_t pts = frame->pts;
        display_frame(frame, pts);
    }
}
3

Flush at End

Ensure all delayed frames are retrieved
// Send NULL packet to flush
avcodec_send_packet(ctx, NULL);

// Receive remaining frames
while (avcodec_receive_frame(ctx, frame) >= 0) {
    display_frame(frame, frame->pts);
}

Codec Implementation Restrictions

Slice Threading

No special restrictions - just needs parallelizable work.
Most codecs can use slice threading if they can divide frames into independent slices.

Frame Threading

Frame threading has stricter requirements:
Frame Threading Restrictions:
  1. Complete Pictures Only
    // Codec must accept entire pictures per packet
    // Cannot handle partial frames
    
  2. No Inter-Frame Dependencies
    // Bad: Codecs like FFV1 with stream state across frames
    // Their bitstreams cannot be decoded in parallel
    
  3. Buffer Access Rules
    • Must call ff_progress_frame_await() before reading reference frames
    • Must call ff_progress_frame_report() after writing frame data
    • No buffer reuse optimizations (reget_buffer() doesn’t work)
  4. No Post-Processing After Report
    // Bad: Modifying buffer after reporting progress
    ff_progress_frame_report(frame, INT_MAX);
    draw_edges(frame);  // ❌ Too late!
    
    // Good: Complete all processing first
    draw_edges(frame);
    ff_progress_frame_report(frame, INT_MAX);  // ✓ Correct
    

Porting Codecs to Frame Threading

1

Identify Shared State

Find all context variables needed by next frame:
typedef struct MyCodecContext {
    // Per-frame state (safe)
    AVFrame *current_frame;
    
    // Shared state (needs attention)
    int sequence_counter;     // Move to before decode
    uint8_t *reference_data;  // Use progress API
} MyCodecContext;
2

Move Initialization Code

Move state changes before decode process:
static int decode_frame(AVCodecContext *avctx, AVFrame *frame) {
    MyCodecContext *s = avctx->priv_data;
    
    // Move shared state updates here
    s->sequence_counter++;
    
    // Allocate frame
    if ((ret = ff_thread_get_buffer(avctx, frame, 0)) < 0)
        return ret;
    
    // Signal setup complete
    ff_thread_finish_setup(avctx);
    
    // Now safe to decode in parallel
    decode_actual_data(avctx, frame);
    
    return 0;
}
3

Add Thread Capability

Enable frame threading in codec definition:
const AVCodec ff_mycodec_decoder = {
    .name           = "mycodec",
    .type           = AVMEDIA_TYPE_VIDEO,
    .id             = AV_CODEC_ID_MYCODEC,
    .priv_data_size = sizeof(MyCodecContext),
    .init           = mycodec_init,
    .decode         = mycodec_decode,
    .close          = mycodec_close,
    .capabilities   = AV_CODEC_CAP_DR1 | 
                     AV_CODEC_CAP_FRAME_THREADS,  // Add this
    .caps_internal  = FF_CODEC_CAP_INIT_CLEANUP,
};
4

Use Progress API for References

For inter-frame dependencies:
// Allocate with progress tracking
ret = ff_progress_frame_get_buffer(avctx, &s->current, 0);

// Wait for reference frame region
ff_progress_frame_await(&s->ref_frame, mb_y);

// Use reference data
motion_compensation(s->current, &s->ref_frame, mb_y);

// Report current frame progress
ff_progress_frame_report(&s->current, mb_y);
5

Handle State Propagation

Implement update_thread_context() for shared state:
static int update_thread_context(
    AVCodecContext *dst,
    const AVCodecContext *src
) {
    MyCodecContext *s1 = dst->priv_data;
    const MyCodecContext *s = src->priv_data;
    
    // Copy state from previous frame's thread
    s1->sequence_counter = s->sequence_counter;
    
    return 0;
}

const AVCodec ff_mycodec_decoder = {
    // ...
    .update_thread_context = update_thread_context,
};

Performance Considerations

Choosing Thread Count

Thread Efficiency

Frame Threading Scaling:
Threads  │  Speedup  │  Efficiency
─────────┼───────────┼────────────
   1     │   1.0x    │    100%
   2     │   1.9x    │     95%
   4     │   3.6x    │     90%
   6     │   5.0x    │     83%
   8     │   6.2x    │     78%
  16     │   9.5x    │     59%
Diminishing returns after 4-6 threads due to synchronization overhead.
Slice Threading Scaling:
Slices   │  Speedup  │  Notes
─────────┼───────────┼─────────────────────
   1     │   1.0x    │  No parallelism
   2     │   1.8x    │  Good efficiency
   4     │   3.4x    │  Optimal for most
   8     │   5.5x    │  Codec-dependent
Slice count limited by codec - can’t always use all CPUs.

Memory Usage

Frame Threading:
  • Requires N * frame_size additional memory
  • Each thread needs its own frame buffer
  • Can be significant for high-resolution video
Slice Threading:
  • Minimal memory overhead
  • Shares single frame buffer
  • More memory-efficient

Best Practices

Use Automatic Threads

Let FFmpeg choose thread count with thread_count = 0

Ensure Thread Safety

All callbacks must be thread-safe with proper synchronization

Profile Your Application

Measure actual performance gains in your use case

Consider Memory

Frame threading uses more RAM - important for embedded systems

Handle Latency

Account for frame delay in real-time applications

Test Thoroughly

Verify thread-safe operation under load

Debugging Threading Issues

Common Issues

Symptom: Occasional corruption or crashesSolution:
// Add mutex protection
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

void shared_resource_access() {
    pthread_mutex_lock(&mutex);
    // Critical section
    pthread_mutex_unlock(&mutex);
}
Symptom: Application hangs during decodeSolution:
  • Check for circular dependencies in progress API
  • Ensure ff_progress_frame_report() is always called
  • Use timeout in ff_progress_frame_await()
Symptom: Growing memory usageSolution:
  • Ensure all allocated frames are freed
  • Check reference counting in multi-threaded context
  • Use valgrind with --fair-sched=yes

Debug Tools

# Run with thread sanitizer
export TSAN_OPTIONS="suppressions=tsan.supp"
ffmpeg -thread_queue_size 512 -i input.mp4 output.mp4

# Check with helgrind
valgrind --tool=helgrind ffmpeg -i input.mp4 -f null -

# Verify with single thread
ffmpeg -threads 1 -i input.mp4 output.mp4

Additional Resources

Architecture

Understanding FFmpeg’s structure

Optimization

Performance optimization techniques

Threading Docs

Official threading documentation

Frame API

Frame handling API reference

Build docs developers (and LLMs) love