Skip to main content

Available Models

React Native ExecuTorch provides pre-configured access to a wide range of optimized language models. All models are automatically downloaded on first use.

Import Models

import {
  LLAMA3_2_1B,
  LLAMA3_2_3B,
  QWEN3_0_6B,
  // ... other models
} from 'react-native-executorch/constants';

Model Families

Llama 3.2

Meta’s Llama 3.2 models, available in 1B and 3B parameter sizes with multiple quantization options.
Parameters: 1 Billion
Precision: BF16 (Brain Floating Point 16-bit)
Size: ~2GB
Best for: Efficient on-device inference, mobile devices
import { LLAMA3_2_1B } from 'react-native-executorch/constants';

const llm = useLLM({ model: LLAMA3_2_1B });
Parameters: 1 Billion
Quantization: QLoRA (Quantized Low-Rank Adaptation)
Size: ~500MB
Best for: Memory-constrained devices, faster loading
import { LLAMA3_2_1B_QLORA } from 'react-native-executorch/constants';

const llm = useLLM({ model: LLAMA3_2_1B_QLORA });
Parameters: 1 Billion
Quantization: SpinQuant
Size: ~600MB
Best for: Balanced performance and size
import { LLAMA3_2_1B_SPINQUANT } from 'react-native-executorch/constants';

const llm = useLLM({ model: LLAMA3_2_1B_SPINQUANT });
Parameters: 3 Billion
Precision: BF16
Size: ~6GB
Best for: Higher quality responses, more capable reasoning
import { LLAMA3_2_3B } from 'react-native-executorch/constants';

const llm = useLLM({ model: LLAMA3_2_3B });
Parameters: 3 Billion
Quantization: QLoRA
Size: ~1.5GB
Best for: Mid-range devices, good quality/size balance
import { LLAMA3_2_3B_QLORA } from 'react-native-executorch/constants';

const llm = useLLM({ model: LLAMA3_2_3B_QLORA });
Parameters: 3 Billion
Quantization: SpinQuant
Size: ~1.8GB
Best for: Better quality than QLoRA with reasonable size
import { LLAMA3_2_3B_SPINQUANT } from 'react-native-executorch/constants';

const llm = useLLM({ model: LLAMA3_2_3B_SPINQUANT });

Qwen 3

Alibaba’s Qwen 3 models, available in 0.6B, 1.7B, and 4B sizes.
Parameters: 0.6 Billion
Precision: BF16
Size: ~1.2GB
Best for: Very efficient inference, low memory usage
import { QWEN3_0_6B } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN3_0_6B });
Parameters: 0.6 Billion
Quantization: 8da4w
Size: ~400MB
Best for: Ultra-lightweight applications
import { QWEN3_0_6B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN3_0_6B_QUANTIZED });
Parameters: 1.7 Billion
Precision: BF16
Size: ~3.4GB
import { QWEN3_1_7B } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN3_1_7B });
Parameters: 1.7 Billion
Quantization: 8da4w
Size: ~900MB
import { QWEN3_1_7B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN3_1_7B_QUANTIZED });
Parameters: 4 Billion
Precision: BF16
Size: ~8GB
Best for: High-quality responses, advanced reasoning
import { QWEN3_4B } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN3_4B });
Parameters: 4 Billion
Quantization: 8da4w
Size: ~2GB
import { QWEN3_4B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN3_4B_QUANTIZED });

Qwen 2.5

Updated Qwen models with improved performance.
Parameters: 0.5 Billion
Precision: BF16
Size: ~1GB
import { QWEN2_5_0_5B } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN2_5_0_5B });
Parameters: 0.5 Billion
Quantization: 8da4w
Size: ~300MB
import { QWEN2_5_0_5B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN2_5_0_5B_QUANTIZED });
Parameters: 1.5 Billion
Precision: BF16
Size: ~3GB
import { QWEN2_5_1_5B } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN2_5_1_5B });
Parameters: 1.5 Billion
Quantization: 8da4w
Size: ~800MB
import { QWEN2_5_1_5B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN2_5_1_5B_QUANTIZED });
Parameters: 3 Billion
Precision: BF16
Size: ~6GB
import { QWEN2_5_3B } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN2_5_3B });
Parameters: 3 Billion
Quantization: 8da4w
Size: ~1.5GB
import { QWEN2_5_3B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN2_5_3B_QUANTIZED });

Hammer 2.1

Efficient models optimized for mobile inference.
Parameters: 0.5 Billion
Precision: BF16
Size: ~1GB
import { HAMMER2_1_0_5B } from 'react-native-executorch/constants';

const llm = useLLM({ model: HAMMER2_1_0_5B });
Parameters: 0.5 Billion
Quantization: 8da4w
Size: ~300MB
import { HAMMER2_1_0_5B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: HAMMER2_1_0_5B_QUANTIZED });
Parameters: 1.5 Billion
Precision: BF16
Size: ~3GB
import { HAMMER2_1_1_5B } from 'react-native-executorch/constants';

const llm = useLLM({ model: HAMMER2_1_1_5B });
Parameters: 1.5 Billion
Quantization: 8da4w
Size: ~800MB
import { HAMMER2_1_1_5B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: HAMMER2_1_1_5B_QUANTIZED });
Parameters: 3 Billion
Precision: BF16
Size: ~6GB
import { HAMMER2_1_3B } from 'react-native-executorch/constants';

const llm = useLLM({ model: HAMMER2_1_3B });
Parameters: 3 Billion
Quantization: 8da4w
Size: ~1.5GB
import { HAMMER2_1_3B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: HAMMER2_1_3B_QUANTIZED });

SmolLM 2

Compact models from Hugging Face, designed for efficiency.
Parameters: 135 Million
Precision: BF16
Size: ~270MB
Best for: Ultra-lightweight applications, quick responses
import { SMOLLM2_1_135M } from 'react-native-executorch/constants';

const llm = useLLM({ model: SMOLLM2_1_135M });
Parameters: 135 Million
Quantization: 8da4w
Size: ~80MB
import { SMOLLM2_1_135M_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: SMOLLM2_1_135M_QUANTIZED });
Parameters: 360 Million
Precision: BF16
Size: ~720MB
import { SMOLLM2_1_360M } from 'react-native-executorch/constants';

const llm = useLLM({ model: SMOLLM2_1_360M });
Parameters: 360 Million
Quantization: 8da4w
Size: ~200MB
import { SMOLLM2_1_360M_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: SMOLLM2_1_360M_QUANTIZED });
Parameters: 1.7 Billion
Precision: BF16
Size: ~3.4GB
import { SMOLLM2_1_1_7B } from 'react-native-executorch/constants';

const llm = useLLM({ model: SMOLLM2_1_1_7B });
Parameters: 1.7 Billion
Quantization: 8da4w
Size: ~900MB
import { SMOLLM2_1_1_7B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: SMOLLM2_1_1_7B_QUANTIZED });

Phi 4 Mini

Microsoft’s efficient small language model.
Parameters: 4 Billion
Precision: BF16
Size: ~8GB
Best for: High-quality reasoning with compact size
import { PHI_4_MINI_4B } from 'react-native-executorch/constants';

const llm = useLLM({ model: PHI_4_MINI_4B });
Parameters: 4 Billion
Quantization: 8da4w
Size: ~2GB
import { PHI_4_MINI_4B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: PHI_4_MINI_4B_QUANTIZED });

LFM 2.5

Latest generation efficient models.
Parameters: 1.2 Billion
Precision: FP16
Size: ~2.4GB
Best for: Instruction following, chat applications
import { LFM2_5_1_2B_INSTRUCT } from 'react-native-executorch/constants';

const llm = useLLM({ model: LFM2_5_1_2B_INSTRUCT });
Parameters: 1.2 Billion
Quantization: 8da4w
Size: ~650MB
import { LFM2_5_1_2B_INSTRUCT_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: LFM2_5_1_2B_INSTRUCT_QUANTIZED });

Model Selection Guide

By Device Capability

Low-end devices (< 4GB RAM):
  • SMOLLM2_1_135M_QUANTIZED (80MB)
  • QWEN3_0_6B_QUANTIZED (400MB)
  • LLAMA3_2_1B_QLORA (500MB)
Mid-range devices (4-6GB RAM):
  • LLAMA3_2_1B (2GB)
  • QWEN2_5_1_5B_QUANTIZED (800MB)
  • HAMMER2_1_1_5B_QUANTIZED (800MB)
High-end devices (6GB+ RAM):
  • LLAMA3_2_3B (6GB)
  • QWEN3_4B (8GB)
  • PHI_4_MINI_4B (8GB)

By Use Case

Quick Q&A / Simple chat:
  • SMOLLM2_1_360M
  • QWEN3_0_6B
  • LLAMA3_2_1B
Advanced reasoning / Complex tasks:
  • LLAMA3_2_3B
  • QWEN3_4B
  • PHI_4_MINI_4B
Tool calling / Function calling:
  • LLAMA3_2_3B (best tool support)
  • QWEN2_5_3B
  • LFM2_5_1_2B_INSTRUCT
Fastest inference:
  • SMOLLM2_1_135M_QUANTIZED
  • QWEN3_0_6B_QUANTIZED
  • LLAMA3_2_1B_SPINQUANT

Understanding Quantization

  • BF16/FP16: Full precision, best quality, largest size
  • QLoRA: Efficient quantization with good quality retention
  • SpinQuant: Balanced quantization method
  • 8da4w: Aggressive 8-bit quantization, smallest size
Quantized models are typically 3-4x smaller with minimal quality loss.

Custom Models

You can also use custom models by providing URLs or local paths:
const customModel = {
  modelSource: { uri: 'https://your-cdn.com/model.pte' },
  tokenizerSource: { uri: 'https://your-cdn.com/tokenizer.json' },
  tokenizerConfigSource: { uri: 'https://your-cdn.com/tokenizer_config.json' },
};

const llm = useLLM({ model: customModel });
Or from local files:
const localModel = {
  modelSource: require('./assets/model.pte'),
  tokenizerSource: require('./assets/tokenizer.json'),
  tokenizerConfigSource: require('./assets/tokenizer_config.json'),
};

const llm = useLLM({ model: localModel });

Build docs developers (and LLMs) love