Asynchronous Job Handling in OpenSSL
The QAT OpenSSL* Engine leverages OpenSSL’s asynchronous mode infrastructure (ASYNC_JOB) introduced in OpenSSL 1.1.0. This infrastructure enables non-blocking cryptographic operations, allowing applications to initiate multiple crypto requests without waiting for each to complete.
How Async Mode Works
In asynchronous mode:- Job Submission: Applications submit cryptographic operations to the engine
- Non-blocking Return: The engine returns immediately without waiting for completion
- Parallel Processing: Multiple operations proceed simultaneously on QAT hardware or in SW batches
- Completion Notification: Applications are notified when operations finish
Completion Notification Methods
The QAT OpenSSL* Engine supports two methods for notifying applications of completed operations:Callback Method
The engine automatically detects and uses callback-based notification when available:- Lower Overhead: More CPU-efficient than file descriptors
- Direct Notification: Engine directly invokes application callback functions
- Automatic Detection: Build system detects OpenSSL version support
- Preferred Method: Used automatically when a callback function is set
File Descriptor Method
Fallback method for compatibility:- Poll/Select Compatible: Works with standard I/O multiplexing
- Legacy Support: Available in all OpenSSL 1.1.0+ versions
- Fallback: Used when no callback function is configured
The callback method is preferred for performance. The engine will automatically use it when your OpenSSL version supports callbacks and your application provides a callback function.
The async_jobs Parameter
Theasync_jobs parameter controls the level of parallelism for asynchronous operations.
What async_jobs Does
- Concurrent Operations: Specifies how many operations can be in-flight simultaneously
- Queue Depth: Determines the depth of request queues for batching (QAT_SW)
- Resource Allocation: Controls memory and context allocation for async operations
Setting async_jobs
In OpenSSL speed tests:Choosing the Right Value
Optimalasync_jobs values depend on several factors:
For QAT Hardware
- Low Concurrency: 8-32 jobs for basic workloads
- High Throughput: 64-128 jobs for maximum performance
- Large Packets: Fewer jobs (32-64) for operations on 8KB+ buffers
- Small Packets: More jobs (128+) for sub-1KB operations
For QAT Software
- Multibuffer Batching: At least 8 jobs to fill batches (QAT_SW processes up to 8 simultaneously)
- High Parallelism: 64-128 jobs for optimal multi-buffer utilization
- Connection Count: Match or exceed the number of concurrent client connections
General Guidelines
More async jobs consume more memory. Balance parallelism needs with available system resources.
Performance Benefits
Asynchronous operations provide significant performance improvements:Throughput Gains
- Hardware Utilization: Keeps QAT devices busy with continuous work
- Batching Efficiency: Enables multi-buffer processing in QAT_SW
- Pipeline Depth: Reduces idle time between operations
Latency Optimization
- Non-blocking: Application threads don’t wait for crypto completion
- Parallel Processing: Multiple operations complete simultaneously
- Reduced Context Switching: Fewer thread wake/sleep cycles
Typical Performance Impact
| Scenario | Sync Mode | Async Mode | Improvement |
|---|---|---|---|
| RSA 2048 Sign (QAT_HW) | 10K ops/sec | 80K ops/sec | 8x |
| RSA 2048 Sign (QAT_SW) | 5K ops/sec | 45K ops/sec | 9x |
| ECDSA P-256 (QAT_SW) | 8K ops/sec | 95K ops/sec | 12x |
| AES-GCM (QAT_SW) | 2 GB/sec | 15 GB/sec | 7.5x |
Example: OpenSSL Speed Test
Basic Asynchronous Test
- Loads the qatengine
- Enables asynchronous mode with 64 parallel jobs
- Tests RSA 2048-bit signature operations
- Reports elapsed (wall clock) time for accurate async measurements
Comparing Sync vs. Async
Multi-buffer Optimization (QAT_SW)
Example: Application Integration
Setting Up Async Context
Performing Async Operation
Cleanup
Best Practices
For Maximum Performance
- Always Use Async Mode: Especially important for QAT_SW multi-buffer efficiency
- Tune async_jobs: Test different values for your specific workload
- Use Callbacks: When supported, callback notification is more efficient than file descriptors
- Monitor Queue Depth: Ensure queues stay full but don’t cause excessive memory usage
For Reliability
- Handle ASYNC_PAUSE: Always handle paused jobs correctly
- Check Return Values: Verify ASYNC_start_job return codes
- Clean Up Resources: Free async contexts when operations complete
- Error Handling: Implement proper error handling for async failures
For Development
- Start Simple: Begin with synchronous mode, then add async
- Test Incremental Jobs: Increase async_jobs gradually to find optimal value
- Measure Performance: Use OpenSSL speed to validate improvements
- Profile Your App: Monitor CPU, memory, and crypto throughput