Available Capabilities
Classification
Categorize images into predefined classes with high accuracy
Object Detection
Detect and localize multiple objects within images or video frames
Semantic Segmentation
Classify every pixel in an image for precise scene understanding
OCR
Extract text from images with multilingual support
Style Transfer
Apply artistic styles to images in real-time
Image Embeddings
Generate feature vectors for similarity search and clustering
Text-to-Image
Generate images from text descriptions using diffusion models
Key Features
On-Device Processing
All models run locally on the device, ensuring:- Privacy: No data leaves the device
- Low latency: No network round-trips
- Offline support: Works without internet connectivity
- Cost efficiency: No server-side inference costs
Optimized Performance
- Hardware-accelerated inference using CoreML (iOS) and XNNPACK
- Quantized models for reduced memory footprint
- Real-time frame processing for camera applications
- Automatic resource management and cleanup
Developer-Friendly API
- React hooks for seamless integration
- TypeScript support with full type safety
- Automatic model downloading and caching
- Progress tracking for model downloads
- Comprehensive error handling
Common Patterns
Basic Usage Pattern
All computer vision hooks follow a consistent pattern:Real-Time Camera Processing
For models that support VisionCamera integration:State Management
All hooks provide consistent state tracking:isReady: Model is loaded and ready for inferenceisGenerating: Currently processing an inputerror: Error object if loading or inference failsdownloadProgress: Download progress (0-1) for first-time model loading
Image Input Formats
Most computer vision models accept images in various formats:- File paths:
file:///path/to/image.jpg - HTTP URLs:
https://example.com/image.png - Base64 strings:
data:image/jpeg;base64,... - Asset URIs:
asset:/image.jpg(Android) - React Native Image sources: Resolved URIs from
Image.resolveAssetSource() - PixelData objects: Raw RGB pixel buffers for advanced use cases
Performance Optimization
Model Selection
Choose models based on your requirements:- MobileNet variants: Faster inference, lower accuracy
- ResNet variants: Higher accuracy, more computational cost
- Quantized models: Reduced memory, minimal accuracy loss
Memory Management
Batch Processing
For multiple images, process sequentially to avoid memory pressure:Error Handling
All hooks use theRnExecutorchError type:
Next Steps
Explore specific computer vision capabilities:- Classification - Categorize images
- Object Detection - Detect objects in images
- Semantic Segmentation - Pixel-level classification
- OCR - Text recognition
- Style Transfer - Artistic image transformation
- Image Embeddings - Feature extraction
- Text-to-Image - Generate images from text