Overview
Optimizing FFmpeg code requires understanding which functions matter most, how to write efficient implementations, and how to leverage architecture-specific features like SIMD instructions.This guide is based on FFmpeg’s official optimization documentation and best practices accumulated over years of development.
What to Optimize
Identify Hot Paths First
Before optimizing, identify the functions that consume the most CPU time:Profile Your Code
Use profiling tools to identify bottlenecks (see Profiling section)
Check Existing Optimizations
Look in the
x86/ directory - many important functions are already optimizedArchitecture-Specific Considerations
- x86/x64
- ARM/NEON
- RISC-V
- Other
Most critical functions already have x86 optimizations. Focus on:
- Fine-tuning existing SIMD code
- Adding AVX2/AVX-512 versions
- Optimizing newer codecs
Function Importance Guide
Critical Functions (Highest Impact)
These functions are used extensively in motion compensation and encoding:Motion Compensation Functions
put_pixels, put_no_rnd_pixels variants:
- Usage: Motion compensation in encoding/decoding
- Priority: Critical
- Impact: High - used in every motion-compensated frame
avg_pixels variants:
- Usage: Motion compensation of B-frames
- Priority: High
- Impact: Medium - only B-frames, but common
Motion Estimation Functions
pix_abs16x16 variants:
- Usage: Motion estimation with SAD (Sum of Absolute Differences)
- Priority: Critical
- Impact: Very high - directly affects encoding speed
pix_abs8x8 variants:
- Usage: MPEG-4 4MV motion estimation
- Priority: Medium
- Impact: Lower than 16x16 variants
MPEG-4 Specific Functions
Quarter-pixel motion compensation:- Usage: MPEG-4 qpel encoding & decoding
- Priority: High for MPEG-4
- Impact: Significant for qpel-enabled content
- Note:
qpel8used only for 4mv,avg_*only for B-frames
- Usage: MPEG-4 GMC decoding
- Priority: Medium
- Impact: Significant when GMC is used
- Note:
gmc1for single warp point (common in DivX5)
Encoding Functions
Pixel processing:- Usage: Encoding
- Priority: High
- Complexity: Easy to optimize
- Usage: Encoding
- Priority: High
- Complexity: Easiest to optimize
- Usage: Encoding
- Priority: Medium
Transform Functions
IDCT/FDCT:- Usage: idct (encoding & decoding), fdct (encoding only)
- Priority: Critical
- Complexity: Difficult to optimize
- Note: Some optimized IDCTs include clamping, making separate clamping functions unused
- Usage: IDCT output processing
- Priority: High
- Complexity: Easy
- Usage: Encoding
- Priority: High / Medium
- Complexity: Difficult
- Note: Trellis quantization is slower, less commonly used
- Usage: Codec-specific decoding/encoding
- Priority: High
Low Priority Functions
Optimization Justification
When to Optimize
Always Justified
- 0.1%+ speedup for common codecs
- No regression in code size/readability
- At least one factor improves
Sometimes Justified
- Smaller gains for less common codecs
- Trade-offs between speed and maintainability
Rarely Justified
- Obscure codec with minimal usage
- Significant complexity increase
- Negligible performance gain
Goal for Obscure Codecs
Keep code clean, small, and readable over raw performance
Performance Measurement
Assembly Optimization
Inline vs External Assembly
- Inline Assembly
- External Assembly
When to use:
- Code should be inlined in C function
- Small, frequently called functions
- Need access to C struct members
- Compiler handles register allocation
- Direct access to C variables
- Better inlining opportunities
General Assembly Tips
- Use assembly loops, not C loops:
- Mark all clobbered registers:
- Don’t rely on registers between asm blocks:
- Prefer external asm over intrinsics:
Alignment Requirements
Many SIMD instructions require aligned data:SIMD Optimization Strategies
Vectorization Patterns
Horizontal Reduction
Horizontal Reduction
Summing elements within a vector:
Loop Unrolling
Loop Unrolling
Process multiple elements per iteration:
Data Rearrangement
Data Rearrangement
Interleave or deinterleave data for efficient processing:
Predication
Predication
Use masks to conditionally process elements:
Profiling
Tools
- Linux - perf
- macOS - Instruments
- Windows - VTune
- Cross-platform - Valgrind
FFmpeg Built-in Benchmarking
Testing Optimizations
Correctness Testing
Performance Testing
Best Practices
Profile First
Always profile before optimizing to find real bottlenecks
Test Correctness
Verify optimized code produces identical output
Measure Impact
Quantify performance improvement objectively
Document Assumptions
Note alignment requirements and constraints
Handle Edge Cases
Ensure correct behavior for unusual inputs
Maintain Readability
Balance optimization with code maintainability
Additional Resources
Architecture Guide
Understanding FFmpeg’s structure
Multithreading
Parallelization strategies
x86 Optimization
Agner Fog’s optimization guides
Intel Intrinsics
SIMD intrinsics reference