QNN Execution Provider
The QNN (Qualcomm Neural Network) Execution Provider enables hardware-accelerated inference on Qualcomm platforms, including Snapdragon mobile processors, IoT devices, and edge compute platforms.When to Use QNN EP
Use the QNN Execution Provider when:- You’re deploying on Android devices with Qualcomm Snapdragon processors
- You need to leverage Qualcomm’s AI accelerators (Hexagon DSP, AI Engine)
- You’re building IoT or edge devices with Qualcomm chipsets
- You want optimized inference on Qualcomm compute platforms
- You need low-power, high-performance inference on mobile
Key Features
- Hexagon DSP: Leverage dedicated signal processing hardware
- AI Engine: Access specialized neural network accelerators
- Multi-Core Optimization: Utilize multiple compute units efficiently
- Low Power: Optimized for battery-powered devices
- Quantization Support: INT8 and FP16 precision modes
- Android Integration: Seamless deployment on Android devices
Prerequisites
Hardware Requirements
Supported Chipsets:- Snapdragon 8 Gen 2/3 (flagship smartphones)
- Snapdragon 7 Series (upper mid-range)
- Snapdragon 6 Series (mid-range)
- Snapdragon 8cx Gen 3 (Windows on ARM)
- Qualcomm IoT and Edge platforms
- Snapdragon 888 or newer for best performance
- Devices with Hexagon 698 DSP or newer
Software Requirements
- Qualcomm Neural Processing SDK (QNN SDK)
- Android NDK (for Android deployment)
- ONNX Runtime with QNN support
- Android API Level 29+ (Android 10+)
Installation
Android (Java/Kotlin)
Android (Native C++)
Python (Linux/Development)
Build from Source
Basic Usage
Java/Kotlin (Android)
C++ (Android NDK)
Python (Linux)
Configuration Options
Backend Selection
QNN supports multiple hardware backends:Priority Settings
Profiling
Advanced Options
Performance Optimization
Quantization
QNN performs best with quantized models:Performance Modes
Context Caching
Save compiled contexts for faster initialization:FP16 Precision
Enable FP16 for better performance:Android Integration
Complete Android Example
Permissions (AndroidManifest.xml)
Asset Packaging
Platform Support
| Platform | Architecture | Support | Notes |
|---|---|---|---|
| Android | ARM64 | ✅ Full | Primary platform |
| Android | ARMv7 | ⚠️ Limited | Older devices |
| Linux | ARM64 | ✅ Limited | Development/testing |
| Windows on ARM | ARM64 | ✅ Limited | Snapdragon PCs |
| Linux | x64 | ❌ No | Use CPU/CUDA instead |
Supported Chipsets
Flagship (Best Performance)
- Snapdragon 8 Gen 3
- Snapdragon 8 Gen 2
- Snapdragon 888/888+
- Snapdragon 8+ Gen 1
Upper Mid-Range
- Snapdragon 7 Gen 1/2
- Snapdragon 778G/782G
- Snapdragon 870
Mid-Range
- Snapdragon 695/690
- Snapdragon 6 Gen 1
Edge/IoT
- Snapdragon 660/665
- Qualcomm IoT platforms
Troubleshooting
Provider Not Available
Backend Loading Errors
Performance Issues
Context Save/Load Errors
Performance Comparison
Typical performance on Snapdragon 888:| Configuration | Latency | Power | Notes |
|---|---|---|---|
| CPU Only | 80ms | High | Baseline |
| QNN (FP32) | 15ms | Medium | Good |
| QNN (FP16) | 8ms | Low | Better |
| QNN (INT8) | 4ms | Very Low | Best |
Best Practices
- Use Quantization: INT8 models run 2-4x faster
- Cache Contexts: Save compiled contexts to reduce init time
- Enable FP16: Minimal accuracy impact, significant speedup
- Profile First: Use profiling to identify bottlenecks
- Test on Device: Performance varies by chipset generation
Next Steps
- Learn about model quantization for QNN
- See mobile optimization best practices
- Compare with other mobile execution providers for Android
- Explore Qualcomm AI Hub for pre-optimized models