Overview
FFmpeg provides two distinct multithreading methods for codec operations, each optimized for different scenarios. Understanding these methods is crucial for achieving optimal performance in your applications.
Multithreading in FFmpeg can significantly improve performance on multi-core systems, but requires careful implementation to ensure thread safety.
Threading Methods
FFmpeg supports two primary threading approaches:
Slice Threading
Decodes multiple parts of a single frame simultaneously.
How It Works Divides a frame into horizontal slices, with each slice decoded by a separate thread
Latency Zero additional latency - frame completes when all slices finish
Parallelism Limited by number of slices codec can divide frame into
Use Case Best for low-latency applications and single-frame operations
Implementation:
Uses AVCodecContext.execute() and execute2() callbacks to distribute slice processing across threads.
Diagram:
Frame N:
┌─────────────────────────────┐
│ Slice 0 → Thread 0 │
├─────────────────────────────┤
│ Slice 1 → Thread 1 │
├─────────────────────────────┤
│ Slice 2 → Thread 2 │
├─────────────────────────────┤
│ Slice 3 → Thread 3 │
└─────────────────────────────┘
↓
Complete Frame N
Frame Threading
Decodes multiple frames simultaneously in parallel threads.
How It Works Processes N frames concurrently, each in its own thread
Latency Adds N-1 frames of delay, where N is the number of threads
Parallelism Excellent scaling with number of CPU cores
Use Case Best for high-throughput batch processing
Implementation:
Accepts N future frames and delays decoded pictures by N-1 frames. While the user displays frame 0, frames 1 through N are being decoded in parallel.
Diagram:
Time →
Thread 0: [Frame 0] [Frame 4] [Frame 8] ...
Thread 1: [Frame 1] [Frame 5] [Frame 9] ...
Thread 2: [Frame 2] [Frame 6] [Frame 10] ...
Thread 3: [Frame 3] [Frame 7] [Frame 11] ...
Output: [0] [1] [2] [3] [4] ...
↑_____delay of 3 frames_____↑
Enabling Multithreading
Setting Thread Count
AVCodecContext * ctx = avcodec_alloc_context3 (codec);
// Automatic thread count (recommended)
ctx -> thread_count = 0 ; // FFmpeg chooses optimal count
// Manual thread count
ctx -> thread_count = 4 ; // Use 4 threads
// Get actual thread count used
int threads = ctx -> thread_count;
Selecting Threading Type
// Enable frame threading (preferred)
ctx -> thread_type = FF_THREAD_FRAME;
// Enable slice threading
ctx -> thread_type = FF_THREAD_SLICE;
// Let FFmpeg decide (automatic)
ctx -> thread_type = FF_THREAD_FRAME | FF_THREAD_SLICE;
// Disable threading
ctx -> thread_count = 1 ;
Frame Threading
Slice Threading
Automatic
AVCodecContext * ctx = avcodec_alloc_context3 (codec);
ctx -> thread_count = 0 ; // Auto
ctx -> thread_type = FF_THREAD_FRAME;
if ( avcodec_open2 (ctx, codec, NULL ) < 0 ) {
fprintf (stderr, "Could not open codec \n " );
return - 1 ;
}
AVCodecContext * ctx = avcodec_alloc_context3 (codec);
ctx -> thread_count = 0 ; // Auto
ctx -> thread_type = FF_THREAD_SLICE;
if ( avcodec_open2 (ctx, codec, NULL ) < 0 ) {
fprintf (stderr, "Could not open codec \n " );
return - 1 ;
}
AVCodecContext * ctx = avcodec_alloc_context3 (codec);
ctx -> thread_count = 0 ; // Auto
// Don't set thread_type - let FFmpeg choose
if ( avcodec_open2 (ctx, codec, NULL ) < 0 ) {
fprintf (stderr, "Could not open codec \n " );
return - 1 ;
}
Client Restrictions
Slice Threading Requirements
The client’s draw_horiz_band() callback must be thread-safe.
// Thread-safe draw_horiz_band implementation
static void draw_horiz_band_threadsafe (
AVCodecContext * ctx ,
const AVFrame * frame ,
int offset [AV_NUM_DATA_POINTERS],
int y , int type , int height
) {
// Use mutex if accessing shared resources
pthread_mutex_lock ( & display_mutex);
// Process the band
display_band (frame, y, height);
pthread_mutex_unlock ( & display_mutex);
}
ctx -> draw_horiz_band = draw_horiz_band_threadsafe;
Frame Threading Requirements
All restrictions from slice threading apply, plus additional requirements:
Thread-Safe Callbacks
Custom get_buffer2() and get_format() must be thread-safe static int get_buffer2_threadsafe (
AVCodecContext * ctx ,
AVFrame * frame ,
int flags
) {
pthread_mutex_lock ( & buffer_mutex);
int ret = allocate_buffer (ctx, frame, flags);
pthread_mutex_unlock ( & buffer_mutex);
return ret;
}
ctx -> get_buffer2 = get_buffer2_threadsafe;
Handle Frame Delay
Account for N-1 frames of delay in timing // pkt_dts and pkt_pts in AVFrame work normally
// but actual frame arrives N-1 frames later
while ( av_read_frame (fmt_ctx, pkt) >= 0 ) {
avcodec_send_packet (ctx, pkt);
while ( avcodec_receive_frame (ctx, frame) >= 0 ) {
// Frame timing is correct despite delay
int64_t pts = frame -> pts ;
display_frame (frame, pts);
}
}
Flush at End
Ensure all delayed frames are retrieved // Send NULL packet to flush
avcodec_send_packet (ctx, NULL );
// Receive remaining frames
while ( avcodec_receive_frame (ctx, frame) >= 0 ) {
display_frame (frame, frame -> pts );
}
Codec Implementation Restrictions
Slice Threading
No special restrictions - just needs parallelizable work.
Most codecs can use slice threading if they can divide frames into independent slices.
Frame Threading
Frame threading has stricter requirements:
Frame Threading Restrictions:
Complete Pictures Only
// Codec must accept entire pictures per packet
// Cannot handle partial frames
No Inter-Frame Dependencies
// Bad: Codecs like FFV1 with stream state across frames
// Their bitstreams cannot be decoded in parallel
Buffer Access Rules
Must call ff_progress_frame_await() before reading reference frames
Must call ff_progress_frame_report() after writing frame data
No buffer reuse optimizations (reget_buffer() doesn’t work)
No Post-Processing After Report
// Bad: Modifying buffer after reporting progress
ff_progress_frame_report (frame, INT_MAX);
draw_edges (frame); // ❌ Too late!
// Good: Complete all processing first
draw_edges (frame);
ff_progress_frame_report (frame, INT_MAX); // ✓ Correct
Porting Codecs to Frame Threading
Identify Shared State
Find all context variables needed by next frame: typedef struct MyCodecContext {
// Per-frame state (safe)
AVFrame * current_frame;
// Shared state (needs attention)
int sequence_counter; // Move to before decode
uint8_t * reference_data; // Use progress API
} MyCodecContext;
Move Initialization Code
Move state changes before decode process: static int decode_frame (AVCodecContext * avctx , AVFrame * frame ) {
MyCodecContext * s = avctx -> priv_data ;
// Move shared state updates here
s -> sequence_counter ++ ;
// Allocate frame
if ((ret = ff_thread_get_buffer (avctx, frame, 0 )) < 0 )
return ret;
// Signal setup complete
ff_thread_finish_setup (avctx);
// Now safe to decode in parallel
decode_actual_data (avctx, frame);
return 0 ;
}
Add Thread Capability
Enable frame threading in codec definition: const AVCodec ff_mycodec_decoder = {
.name = "mycodec" ,
.type = AVMEDIA_TYPE_VIDEO,
.id = AV_CODEC_ID_MYCODEC,
.priv_data_size = sizeof (MyCodecContext),
.init = mycodec_init,
.decode = mycodec_decode,
.close = mycodec_close,
.capabilities = AV_CODEC_CAP_DR1 |
AV_CODEC_CAP_FRAME_THREADS, // Add this
.caps_internal = FF_CODEC_CAP_INIT_CLEANUP,
};
Use Progress API for References
For inter-frame dependencies: // Allocate with progress tracking
ret = ff_progress_frame_get_buffer (avctx, & s -> current , 0 );
// Wait for reference frame region
ff_progress_frame_await ( & s -> ref_frame , mb_y);
// Use reference data
motion_compensation (s -> current , & s -> ref_frame , mb_y);
// Report current frame progress
ff_progress_frame_report ( & s -> current , mb_y);
Handle State Propagation
Implement update_thread_context() for shared state: static int update_thread_context (
AVCodecContext * dst ,
const AVCodecContext * src
) {
MyCodecContext * s1 = dst -> priv_data ;
const MyCodecContext * s = src -> priv_data ;
// Copy state from previous frame's thread
s1 -> sequence_counter = s -> sequence_counter ;
return 0 ;
}
const AVCodec ff_mycodec_decoder = {
// ...
.update_thread_context = update_thread_context,
};
Choosing Thread Count
Automatic (Recommended)
Manual Tuning
Low Latency
High Throughput
FFmpeg chooses optimal count based on:
Number of CPU cores
Codec requirements
System load
// Get CPU count
int cpu_count = av_cpu_count ();
// For frame threading: use fewer threads
// (diminishing returns after 4-6 threads)
ctx -> thread_count = FFMIN (cpu_count, 6 );
// For slice threading: can use all CPUs
ctx -> thread_count = cpu_count;
// Prefer slice threading for low latency
ctx -> thread_type = FF_THREAD_SLICE;
ctx -> thread_count = 2 ; // Minimal threads
// Prefer frame threading for throughput
ctx -> thread_type = FF_THREAD_FRAME;
ctx -> thread_count = 0 ; // Auto (more threads)
Thread Efficiency
Frame Threading Scaling:
Threads │ Speedup │ Efficiency
─────────┼───────────┼────────────
1 │ 1.0x │ 100%
2 │ 1.9x │ 95%
4 │ 3.6x │ 90%
6 │ 5.0x │ 83%
8 │ 6.2x │ 78%
16 │ 9.5x │ 59%
Diminishing returns after 4-6 threads due to synchronization overhead.
Slice Threading Scaling:
Slices │ Speedup │ Notes
─────────┼───────────┼─────────────────────
1 │ 1.0x │ No parallelism
2 │ 1.8x │ Good efficiency
4 │ 3.4x │ Optimal for most
8 │ 5.5x │ Codec-dependent
Slice count limited by codec - can’t always use all CPUs.
Memory Usage
Frame Threading:
Requires N * frame_size additional memory
Each thread needs its own frame buffer
Can be significant for high-resolution video
Slice Threading:
Minimal memory overhead
Shares single frame buffer
More memory-efficient
Best Practices
Use Automatic Threads Let FFmpeg choose thread count with thread_count = 0
Ensure Thread Safety All callbacks must be thread-safe with proper synchronization
Profile Your Application Measure actual performance gains in your use case
Consider Memory Frame threading uses more RAM - important for embedded systems
Handle Latency Account for frame delay in real-time applications
Test Thoroughly Verify thread-safe operation under load
Debugging Threading Issues
Common Issues
Symptom: Occasional corruption or crashesSolution: // Add mutex protection
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void shared_resource_access () {
pthread_mutex_lock ( & mutex);
// Critical section
pthread_mutex_unlock ( & mutex);
}
Symptom: Application hangs during decodeSolution:
Check for circular dependencies in progress API
Ensure ff_progress_frame_report() is always called
Use timeout in ff_progress_frame_await()
Symptom: Growing memory usageSolution:
Ensure all allocated frames are freed
Check reference counting in multi-threaded context
Use valgrind with --fair-sched=yes
# Run with thread sanitizer
export TSAN_OPTIONS = "suppressions=tsan.supp"
ffmpeg -thread_queue_size 512 -i input.mp4 output.mp4
# Check with helgrind
valgrind --tool=helgrind ffmpeg -i input.mp4 -f null -
# Verify with single thread
ffmpeg -threads 1 -i input.mp4 output.mp4
Additional Resources
Architecture Understanding FFmpeg’s structure
Optimization Performance optimization techniques
Threading Docs Official threading documentation
Frame API Frame handling API reference