Overview
ONNX Runtime Web supports multiple execution backends:- WebAssembly (WASM): CPU execution with SIMD support
- WebGPU: GPU acceleration for modern browsers
- WebGL: Legacy GPU support
- WebNN: Web Neural Network API for hardware acceleration
Installation
Using NPM
Using CDN
Basic Usage
Creating an Inference Session
Loading Models
From URL
From ArrayBuffer
From Uint8Array
Execution Providers
WebAssembly (CPU)
Default CPU execution:SIMD Support
Enable SIMD for better performance:WebGPU
Modern GPU acceleration:WebGPU Options
WebGL
Legacy GPU support:WebGL Context Options
WebNN
Hardware acceleration via Web Neural Network API:Session Options
Graph Optimization
Logging
Multi-Threading
Working with Tensors
Creating Tensors
Supported Data Types
float32,float64int8,uint8,int16,uint16,int32,uint32int64,uint64(BigInt)bool,stringfloat16(Uint16Array)
Tensor from Image
Advanced Features
Pre-allocated Outputs
Model Metadata
Run Options
Performance Optimization
Model Optimization
- Use ORT format: Convert to
.ortfor faster loading - Quantization: Use INT8 quantization
- Graph optimization: Enable ‘all’ optimization level
WASM Configuration
WebGPU Optimizations
Caching Sessions
Browser Compatibility
Feature Detection
Fallback Strategy
Build and Deployment
Webpack Configuration
Serving WASM Files
Ensure proper MIME types:Cross-Origin Isolation
For multi-threading support:Example: Image Classification
Troubleshooting
Common Issues
WASM files not loading:- Check file paths in
ort.env.wasm.wasmPaths - Verify server MIME type for
.wasmfiles - Check browser console for CORS errors
- Reduce model size or use quantization
- Enable memory pattern optimization
- Dispose sessions when not needed
- Check browser support (Chrome 94+, Edge 94+)
- Ensure GPU is available
- Fallback to WebGL or WASM