What is QAT Software Acceleration?
Intel® QAT Software acceleration provides CPU-based cryptographic optimization using Intel® Advanced Vector Extensions (AVX-512) and specialized instruction sets. Unlike hardware acceleration, QAT_SW leverages vectorized CPU instructions to perform multiple cryptographic operations in parallel without requiring dedicated QAT hardware devices. This approach is ideal for platforms that support modern Intel processors with AVX-512 capabilities but may not have dedicated QAT hardware accelerators installed.Crypto Multi-buffer Library
QAT Software acceleration utilizes the Intel® Crypto Multi-buffer library, which implements:Multi-buffer Processing
The Multi-buffer approach batches multiple cryptographic requests together and processes them in parallel using AVX-512 IFMA (Integer Fused Multiply Add) operations:- Request Batching: Multiple requests are maintained in queues
- Parallel Processing: Up to 8 requests are processed simultaneously using vectorized instructions
- Asynchronous Completion: Uses OpenSSL’s asynchronous infrastructure for non-blocking operations
Supported Asymmetric Algorithms
The Crypto Multi-buffer library accelerates the following asymmetric PKE algorithms:- RSA: Key sizes 2048, 3072, 4096 bits
- ECDH: X25519 (Montgomery Curve), P-256, P-384 (NIST Prime Curves), SM2
- ECDSA: P-256, P-384 (NIST Prime Curves), SM2
- SM2: Chinese national public key cryptography standard
Multi-buffer acceleration is most beneficial in asynchronous mode with many parallel connections to fully utilize the batching mechanism.
IPSec_MB Library
For symmetric encryption, QAT Software uses the Intel® Multi-Buffer Crypto for IPsec Library (IPSec_MB):Synchronous Mechanism
Unlike asymmetric operations, symmetric AES-GCM processing follows a synchronous mechanism:- Requests are submitted directly to the IPSec_MB library
- Processing occurs in multiple blocks using vectorized instructions
- No batching or queueing of requests
Vectorized AES Processing
The IPSec_MB library leverages:- VAES: Vector AES instruction set extension
- AVX2/AVX512: Advanced vector extensions for parallel data processing
- VPCLMULQDQ: Vector carry-less multiplication for GCM mode
Supported Symmetric Algorithms
- AES-GCM: AES128-GCM, AES192-GCM, AES256-GCM
- SM4-CBC: 16 Multibuffer requests (Tongsuo only)
- SM4-GCM: 16 Multibuffer requests (Tongsuo only)
- SM4-CCM: 16 Multibuffer requests (Tongsuo only)
Supported Hash Algorithms
- SM3: Hash using 16 Multibuffer requests (Experimental)
Platform Requirements
QAT Software acceleration requires an Intel processor with specific instruction set extensions:Minimum Processor Generation
3rd Generation Intel® Xeon® Scalable Processors or newer with the following instruction sets:Checking Processor Support
You can verify your processor supports the required instructions:How to Enable QAT_SW
To build the QAT OpenSSL* Engine with QAT Software support:Performance Characteristics
When to Use QAT_SW
QAT Software acceleration is beneficial when:- High Parallelism: Applications with many concurrent connections or asynchronous operations
- No QAT Hardware: Systems without dedicated QAT accelerator cards
- Specific Algorithms: AES-GCM operations often perform better with QAT_SW than QAT_HW
- Modern Processors: 3rd Gen Xeon or newer with full AVX-512 support
Optimal Use Cases
- TLS/SSL Servers: High-throughput web servers with many simultaneous connections
- VPN Gateways: IPsec workloads with AES-GCM encryption
- Cloud Environments: Virtual machines on modern Intel processors
- Microservices: Applications with asynchronous I/O patterns
For best performance, use asynchronous mode (
async_jobs parameter) to enable request batching and maximize vectorization efficiency.Limitations and Considerations
Asynchronous Mode Requirement
Multi-buffer asymmetric acceleration requires asynchronous operation:- Synchronous mode will not benefit from batching
- Configure adequate
async_jobsfor your workload - More parallel connections improve batching efficiency
CPU Utilization
Unlike hardware acceleration, QAT_SW consumes CPU resources:- Trade-off between CPU usage and cryptographic performance
- Monitor CPU utilization in production deployments
- Consider hardware acceleration if CPU resources are constrained
Platform Dependency
QAT Software acceleration is limited to Intel processors with the required instruction sets:- Will not function on older Intel processors (pre-3rd Gen Xeon)
- Not available on non-Intel platforms
- Verify instruction set support before deployment