Available Models
React Native ExecuTorch provides pre-configured access to a wide range of optimized language models. All models are automatically downloaded on first use.Import Models
Model Families
Llama 3.2
Meta’s Llama 3.2 models, available in 1B and 3B parameter sizes with multiple quantization options.LLAMA3_2_1B
LLAMA3_2_1B
Parameters: 1 Billion
Precision: BF16 (Brain Floating Point 16-bit)
Size: ~2GB
Best for: Efficient on-device inference, mobile devices
Precision: BF16 (Brain Floating Point 16-bit)
Size: ~2GB
Best for: Efficient on-device inference, mobile devices
LLAMA3_2_1B_QLORA
LLAMA3_2_1B_QLORA
Parameters: 1 Billion
Quantization: QLoRA (Quantized Low-Rank Adaptation)
Size: ~500MB
Best for: Memory-constrained devices, faster loading
Quantization: QLoRA (Quantized Low-Rank Adaptation)
Size: ~500MB
Best for: Memory-constrained devices, faster loading
LLAMA3_2_1B_SPINQUANT
LLAMA3_2_1B_SPINQUANT
Parameters: 1 Billion
Quantization: SpinQuant
Size: ~600MB
Best for: Balanced performance and size
Quantization: SpinQuant
Size: ~600MB
Best for: Balanced performance and size
LLAMA3_2_3B
LLAMA3_2_3B
Parameters: 3 Billion
Precision: BF16
Size: ~6GB
Best for: Higher quality responses, more capable reasoning
Precision: BF16
Size: ~6GB
Best for: Higher quality responses, more capable reasoning
LLAMA3_2_3B_QLORA
LLAMA3_2_3B_QLORA
Parameters: 3 Billion
Quantization: QLoRA
Size: ~1.5GB
Best for: Mid-range devices, good quality/size balance
Quantization: QLoRA
Size: ~1.5GB
Best for: Mid-range devices, good quality/size balance
LLAMA3_2_3B_SPINQUANT
LLAMA3_2_3B_SPINQUANT
Parameters: 3 Billion
Quantization: SpinQuant
Size: ~1.8GB
Best for: Better quality than QLoRA with reasonable size
Quantization: SpinQuant
Size: ~1.8GB
Best for: Better quality than QLoRA with reasonable size
Qwen 3
Alibaba’s Qwen 3 models, available in 0.6B, 1.7B, and 4B sizes.QWEN3_0_6B
QWEN3_0_6B
Parameters: 0.6 Billion
Precision: BF16
Size: ~1.2GB
Best for: Very efficient inference, low memory usage
Precision: BF16
Size: ~1.2GB
Best for: Very efficient inference, low memory usage
QWEN3_0_6B_QUANTIZED
QWEN3_0_6B_QUANTIZED
Parameters: 0.6 Billion
Quantization: 8da4w
Size: ~400MB
Best for: Ultra-lightweight applications
Quantization: 8da4w
Size: ~400MB
Best for: Ultra-lightweight applications
QWEN3_1_7B
QWEN3_1_7B
Parameters: 1.7 Billion
Precision: BF16
Size: ~3.4GB
Precision: BF16
Size: ~3.4GB
QWEN3_1_7B_QUANTIZED
QWEN3_1_7B_QUANTIZED
Parameters: 1.7 Billion
Quantization: 8da4w
Size: ~900MB
Quantization: 8da4w
Size: ~900MB
QWEN3_4B
QWEN3_4B
Parameters: 4 Billion
Precision: BF16
Size: ~8GB
Best for: High-quality responses, advanced reasoning
Precision: BF16
Size: ~8GB
Best for: High-quality responses, advanced reasoning
QWEN3_4B_QUANTIZED
QWEN3_4B_QUANTIZED
Parameters: 4 Billion
Quantization: 8da4w
Size: ~2GB
Quantization: 8da4w
Size: ~2GB
Qwen 2.5
Updated Qwen models with improved performance.QWEN2_5_0_5B
QWEN2_5_0_5B
Parameters: 0.5 Billion
Precision: BF16
Size: ~1GB
Precision: BF16
Size: ~1GB
QWEN2_5_0_5B_QUANTIZED
QWEN2_5_0_5B_QUANTIZED
Parameters: 0.5 Billion
Quantization: 8da4w
Size: ~300MB
Quantization: 8da4w
Size: ~300MB
QWEN2_5_1_5B
QWEN2_5_1_5B
Parameters: 1.5 Billion
Precision: BF16
Size: ~3GB
Precision: BF16
Size: ~3GB
QWEN2_5_1_5B_QUANTIZED
QWEN2_5_1_5B_QUANTIZED
Parameters: 1.5 Billion
Quantization: 8da4w
Size: ~800MB
Quantization: 8da4w
Size: ~800MB
QWEN2_5_3B
QWEN2_5_3B
Parameters: 3 Billion
Precision: BF16
Size: ~6GB
Precision: BF16
Size: ~6GB
QWEN2_5_3B_QUANTIZED
QWEN2_5_3B_QUANTIZED
Parameters: 3 Billion
Quantization: 8da4w
Size: ~1.5GB
Quantization: 8da4w
Size: ~1.5GB
Hammer 2.1
Efficient models optimized for mobile inference.HAMMER2_1_0_5B
HAMMER2_1_0_5B
Parameters: 0.5 Billion
Precision: BF16
Size: ~1GB
Precision: BF16
Size: ~1GB
HAMMER2_1_0_5B_QUANTIZED
HAMMER2_1_0_5B_QUANTIZED
Parameters: 0.5 Billion
Quantization: 8da4w
Size: ~300MB
Quantization: 8da4w
Size: ~300MB
HAMMER2_1_1_5B
HAMMER2_1_1_5B
Parameters: 1.5 Billion
Precision: BF16
Size: ~3GB
Precision: BF16
Size: ~3GB
HAMMER2_1_1_5B_QUANTIZED
HAMMER2_1_1_5B_QUANTIZED
Parameters: 1.5 Billion
Quantization: 8da4w
Size: ~800MB
Quantization: 8da4w
Size: ~800MB
HAMMER2_1_3B
HAMMER2_1_3B
Parameters: 3 Billion
Precision: BF16
Size: ~6GB
Precision: BF16
Size: ~6GB
HAMMER2_1_3B_QUANTIZED
HAMMER2_1_3B_QUANTIZED
Parameters: 3 Billion
Quantization: 8da4w
Size: ~1.5GB
Quantization: 8da4w
Size: ~1.5GB
SmolLM 2
Compact models from Hugging Face, designed for efficiency.SMOLLM2_1_135M
SMOLLM2_1_135M
Parameters: 135 Million
Precision: BF16
Size: ~270MB
Best for: Ultra-lightweight applications, quick responses
Precision: BF16
Size: ~270MB
Best for: Ultra-lightweight applications, quick responses
SMOLLM2_1_135M_QUANTIZED
SMOLLM2_1_135M_QUANTIZED
Parameters: 135 Million
Quantization: 8da4w
Size: ~80MB
Quantization: 8da4w
Size: ~80MB
SMOLLM2_1_360M
SMOLLM2_1_360M
Parameters: 360 Million
Precision: BF16
Size: ~720MB
Precision: BF16
Size: ~720MB
SMOLLM2_1_360M_QUANTIZED
SMOLLM2_1_360M_QUANTIZED
Parameters: 360 Million
Quantization: 8da4w
Size: ~200MB
Quantization: 8da4w
Size: ~200MB
SMOLLM2_1_1_7B
SMOLLM2_1_1_7B
Parameters: 1.7 Billion
Precision: BF16
Size: ~3.4GB
Precision: BF16
Size: ~3.4GB
SMOLLM2_1_1_7B_QUANTIZED
SMOLLM2_1_1_7B_QUANTIZED
Parameters: 1.7 Billion
Quantization: 8da4w
Size: ~900MB
Quantization: 8da4w
Size: ~900MB
Phi 4 Mini
Microsoft’s efficient small language model.PHI_4_MINI_4B
PHI_4_MINI_4B
Parameters: 4 Billion
Precision: BF16
Size: ~8GB
Best for: High-quality reasoning with compact size
Precision: BF16
Size: ~8GB
Best for: High-quality reasoning with compact size
PHI_4_MINI_4B_QUANTIZED
PHI_4_MINI_4B_QUANTIZED
Parameters: 4 Billion
Quantization: 8da4w
Size: ~2GB
Quantization: 8da4w
Size: ~2GB
LFM 2.5
Latest generation efficient models.LFM2_5_1_2B_INSTRUCT
LFM2_5_1_2B_INSTRUCT
Parameters: 1.2 Billion
Precision: FP16
Size: ~2.4GB
Best for: Instruction following, chat applications
Precision: FP16
Size: ~2.4GB
Best for: Instruction following, chat applications
LFM2_5_1_2B_INSTRUCT_QUANTIZED
LFM2_5_1_2B_INSTRUCT_QUANTIZED
Parameters: 1.2 Billion
Quantization: 8da4w
Size: ~650MB
Quantization: 8da4w
Size: ~650MB
Model Selection Guide
By Device Capability
Low-end devices (< 4GB RAM):SMOLLM2_1_135M_QUANTIZED(80MB)QWEN3_0_6B_QUANTIZED(400MB)LLAMA3_2_1B_QLORA(500MB)
LLAMA3_2_1B(2GB)QWEN2_5_1_5B_QUANTIZED(800MB)HAMMER2_1_1_5B_QUANTIZED(800MB)
LLAMA3_2_3B(6GB)QWEN3_4B(8GB)PHI_4_MINI_4B(8GB)
By Use Case
Quick Q&A / Simple chat:SMOLLM2_1_360MQWEN3_0_6BLLAMA3_2_1B
LLAMA3_2_3BQWEN3_4BPHI_4_MINI_4B
LLAMA3_2_3B(best tool support)QWEN2_5_3BLFM2_5_1_2B_INSTRUCT
SMOLLM2_1_135M_QUANTIZEDQWEN3_0_6B_QUANTIZEDLLAMA3_2_1B_SPINQUANT
Understanding Quantization
- BF16/FP16: Full precision, best quality, largest size
- QLoRA: Efficient quantization with good quality retention
- SpinQuant: Balanced quantization method
- 8da4w: Aggressive 8-bit quantization, smallest size