Available Models

React Native ExecuTorch provides pre-configured access to a wide range of optimized language models. All models are automatically downloaded on first use.

Import Models

import {
  LLAMA3_2_1B,
  LLAMA3_2_3B,
  QWEN3_0_6B,
  // ... other models
} from 'react-native-executorch/constants';

Model Families

Llama 3.2

Meta’s Llama 3.2 models, available in 1B and 3B parameter sizes with multiple quantization options.

LLAMA3_2_1B

Parameters: 1 Billion
Precision: BF16 (Brain Floating Point 16-bit)
Size: ~2GB
Best for: Efficient on-device inference, mobile devices

import { LLAMA3_2_1B } from 'react-native-executorch/constants';

const llm = useLLM({ model: LLAMA3_2_1B });

LLAMA3_2_1B_QLORA

Parameters: 1 Billion
Quantization: QLoRA (Quantized Low-Rank Adaptation)
Size: ~500MB
Best for: Memory-constrained devices, faster loading

import { LLAMA3_2_1B_QLORA } from 'react-native-executorch/constants';

const llm = useLLM({ model: LLAMA3_2_1B_QLORA });

LLAMA3_2_1B_SPINQUANT

Parameters: 1 Billion
Quantization: SpinQuant
Size: ~600MB
Best for: Balanced performance and size

import { LLAMA3_2_1B_SPINQUANT } from 'react-native-executorch/constants';

const llm = useLLM({ model: LLAMA3_2_1B_SPINQUANT });

LLAMA3_2_3B

Parameters: 3 Billion
Precision: BF16
Size: ~6GB
Best for: Higher quality responses, more capable reasoning

import { LLAMA3_2_3B } from 'react-native-executorch/constants';

const llm = useLLM({ model: LLAMA3_2_3B });

LLAMA3_2_3B_QLORA

Parameters: 3 Billion
Quantization: QLoRA
Size: ~1.5GB
Best for: Mid-range devices, good quality/size balance

import { LLAMA3_2_3B_QLORA } from 'react-native-executorch/constants';

const llm = useLLM({ model: LLAMA3_2_3B_QLORA });

LLAMA3_2_3B_SPINQUANT

Parameters: 3 Billion
Quantization: SpinQuant
Size: ~1.8GB
Best for: Better quality than QLoRA with reasonable size

import { LLAMA3_2_3B_SPINQUANT } from 'react-native-executorch/constants';

const llm = useLLM({ model: LLAMA3_2_3B_SPINQUANT });

Qwen 3

Alibaba’s Qwen 3 models, available in 0.6B, 1.7B, and 4B sizes.

QWEN3_0_6B

Parameters: 0.6 Billion
Precision: BF16
Size: ~1.2GB
Best for: Very efficient inference, low memory usage

import { QWEN3_0_6B } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN3_0_6B });

QWEN3_0_6B_QUANTIZED

Parameters: 0.6 Billion
Quantization: 8da4w
Size: ~400MB
Best for: Ultra-lightweight applications

import { QWEN3_0_6B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN3_0_6B_QUANTIZED });

QWEN3_1_7B

Parameters: 1.7 Billion
Precision: BF16
Size: ~3.4GB

import { QWEN3_1_7B } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN3_1_7B });

QWEN3_1_7B_QUANTIZED

Parameters: 1.7 Billion
Quantization: 8da4w
Size: ~900MB

import { QWEN3_1_7B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN3_1_7B_QUANTIZED });

QWEN3_4B

Parameters: 4 Billion
Precision: BF16
Size: ~8GB
Best for: High-quality responses, advanced reasoning

import { QWEN3_4B } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN3_4B });

QWEN3_4B_QUANTIZED

Parameters: 4 Billion
Quantization: 8da4w
Size: ~2GB

import { QWEN3_4B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN3_4B_QUANTIZED });

Qwen 2.5

Updated Qwen models with improved performance.

QWEN2_5_0_5B

Parameters: 0.5 Billion
Precision: BF16
Size: ~1GB

import { QWEN2_5_0_5B } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN2_5_0_5B });

QWEN2_5_0_5B_QUANTIZED

Parameters: 0.5 Billion
Quantization: 8da4w
Size: ~300MB

import { QWEN2_5_0_5B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN2_5_0_5B_QUANTIZED });

QWEN2_5_1_5B

Parameters: 1.5 Billion
Precision: BF16
Size: ~3GB

import { QWEN2_5_1_5B } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN2_5_1_5B });

QWEN2_5_1_5B_QUANTIZED

Parameters: 1.5 Billion
Quantization: 8da4w
Size: ~800MB

import { QWEN2_5_1_5B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN2_5_1_5B_QUANTIZED });

QWEN2_5_3B

Parameters: 3 Billion
Precision: BF16
Size: ~6GB

import { QWEN2_5_3B } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN2_5_3B });

QWEN2_5_3B_QUANTIZED

Parameters: 3 Billion
Quantization: 8da4w
Size: ~1.5GB

import { QWEN2_5_3B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: QWEN2_5_3B_QUANTIZED });

Hammer 2.1

Efficient models optimized for mobile inference.

HAMMER2_1_0_5B

Parameters: 0.5 Billion
Precision: BF16
Size: ~1GB

import { HAMMER2_1_0_5B } from 'react-native-executorch/constants';

const llm = useLLM({ model: HAMMER2_1_0_5B });

HAMMER2_1_0_5B_QUANTIZED

Parameters: 0.5 Billion
Quantization: 8da4w
Size: ~300MB

import { HAMMER2_1_0_5B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: HAMMER2_1_0_5B_QUANTIZED });

HAMMER2_1_1_5B

Parameters: 1.5 Billion
Precision: BF16
Size: ~3GB

import { HAMMER2_1_1_5B } from 'react-native-executorch/constants';

const llm = useLLM({ model: HAMMER2_1_1_5B });

HAMMER2_1_1_5B_QUANTIZED

Parameters: 1.5 Billion
Quantization: 8da4w
Size: ~800MB

import { HAMMER2_1_1_5B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: HAMMER2_1_1_5B_QUANTIZED });

HAMMER2_1_3B

Parameters: 3 Billion
Precision: BF16
Size: ~6GB

import { HAMMER2_1_3B } from 'react-native-executorch/constants';

const llm = useLLM({ model: HAMMER2_1_3B });

HAMMER2_1_3B_QUANTIZED

Parameters: 3 Billion
Quantization: 8da4w
Size: ~1.5GB

import { HAMMER2_1_3B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: HAMMER2_1_3B_QUANTIZED });

SmolLM 2

Compact models from Hugging Face, designed for efficiency.

SMOLLM2_1_135M

Parameters: 135 Million
Precision: BF16
Size: ~270MB
Best for: Ultra-lightweight applications, quick responses

import { SMOLLM2_1_135M } from 'react-native-executorch/constants';

const llm = useLLM({ model: SMOLLM2_1_135M });

SMOLLM2_1_135M_QUANTIZED

Parameters: 135 Million
Quantization: 8da4w
Size: ~80MB

import { SMOLLM2_1_135M_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: SMOLLM2_1_135M_QUANTIZED });

SMOLLM2_1_360M

Parameters: 360 Million
Precision: BF16
Size: ~720MB

import { SMOLLM2_1_360M } from 'react-native-executorch/constants';

const llm = useLLM({ model: SMOLLM2_1_360M });

SMOLLM2_1_360M_QUANTIZED

Parameters: 360 Million
Quantization: 8da4w
Size: ~200MB

import { SMOLLM2_1_360M_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: SMOLLM2_1_360M_QUANTIZED });

SMOLLM2_1_1_7B

Parameters: 1.7 Billion
Precision: BF16
Size: ~3.4GB

import { SMOLLM2_1_1_7B } from 'react-native-executorch/constants';

const llm = useLLM({ model: SMOLLM2_1_1_7B });

SMOLLM2_1_1_7B_QUANTIZED

Parameters: 1.7 Billion
Quantization: 8da4w
Size: ~900MB

import { SMOLLM2_1_1_7B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: SMOLLM2_1_1_7B_QUANTIZED });

Phi 4 Mini

Microsoft’s efficient small language model.

PHI_4_MINI_4B

Parameters: 4 Billion
Precision: BF16
Size: ~8GB
Best for: High-quality reasoning with compact size

import { PHI_4_MINI_4B } from 'react-native-executorch/constants';

const llm = useLLM({ model: PHI_4_MINI_4B });

PHI_4_MINI_4B_QUANTIZED

Parameters: 4 Billion
Quantization: 8da4w
Size: ~2GB

import { PHI_4_MINI_4B_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: PHI_4_MINI_4B_QUANTIZED });

LFM 2.5

Latest generation efficient models.

LFM2_5_1_2B_INSTRUCT

Parameters: 1.2 Billion
Precision: FP16
Size: ~2.4GB
Best for: Instruction following, chat applications

import { LFM2_5_1_2B_INSTRUCT } from 'react-native-executorch/constants';

const llm = useLLM({ model: LFM2_5_1_2B_INSTRUCT });

LFM2_5_1_2B_INSTRUCT_QUANTIZED

Parameters: 1.2 Billion
Quantization: 8da4w
Size: ~650MB

import { LFM2_5_1_2B_INSTRUCT_QUANTIZED } from 'react-native-executorch/constants';

const llm = useLLM({ model: LFM2_5_1_2B_INSTRUCT_QUANTIZED });

Model Selection Guide

By Device Capability

Low-end devices (< 4GB RAM):

SMOLLM2_1_135M_QUANTIZED (80MB)
QWEN3_0_6B_QUANTIZED (400MB)
LLAMA3_2_1B_QLORA (500MB)

Mid-range devices (4-6GB RAM):

LLAMA3_2_1B (2GB)
QWEN2_5_1_5B_QUANTIZED (800MB)
HAMMER2_1_1_5B_QUANTIZED (800MB)

High-end devices (6GB+ RAM):

LLAMA3_2_3B (6GB)
QWEN3_4B (8GB)
PHI_4_MINI_4B (8GB)

By Use Case

Quick Q&A / Simple chat:

SMOLLM2_1_360M
QWEN3_0_6B
LLAMA3_2_1B

Advanced reasoning / Complex tasks:

LLAMA3_2_3B
QWEN3_4B
PHI_4_MINI_4B

Tool calling / Function calling:

LLAMA3_2_3B (best tool support)
QWEN2_5_3B
LFM2_5_1_2B_INSTRUCT

Fastest inference:

SMOLLM2_1_135M_QUANTIZED
QWEN3_0_6B_QUANTIZED
LLAMA3_2_1B_SPINQUANT

Understanding Quantization

BF16/FP16: Full precision, best quality, largest size
QLoRA: Efficient quantization with good quality retention
SpinQuant: Balanced quantization method
8da4w: Aggressive 8-bit quantization, smallest size

Quantized models are typically 3-4x smaller with minimal quality loss.

Custom Models

You can also use custom models by providing URLs or local paths:

const customModel = {
  modelSource: { uri: 'https://your-cdn.com/model.pte' },
  tokenizerSource: { uri: 'https://your-cdn.com/tokenizer.json' },
  tokenizerConfigSource: { uri: 'https://your-cdn.com/tokenizer_config.json' },
};

const llm = useLLM({ model: customModel });

Or from local files:

const localModel = {
  modelSource: require('./assets/model.pte'),
  tokenizerSource: require('./assets/tokenizer.json'),
  tokenizerConfigSource: require('./assets/tokenizer_config.json'),
};

const llm = useLLM({ model: localModel });

Getting Started

Core Concepts

Large Language Models

Computer Vision

Speech & Audio

Text Embeddings

Advanced

Guides

Available Models

Available Models

Import Models

Model Families

Llama 3.2

Qwen 3

Qwen 2.5

Hammer 2.1

SmolLM 2

Phi 4 Mini

LFM 2.5

Model Selection Guide

By Device Capability

By Use Case

Understanding Quantization

Custom Models

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Large Language Models

Computer Vision

Speech & Audio

Text Embeddings

Advanced

Guides

​Available Models

​Import Models

​Model Families

​Llama 3.2

​Qwen 3

​Qwen 2.5

​Hammer 2.1

​SmolLM 2

​Phi 4 Mini

​LFM 2.5

​Model Selection Guide

​By Device Capability

​By Use Case

​Understanding Quantization

​Custom Models

Build docs developers (and LLMs) love

Available Models

Import Models

Model Families

Llama 3.2

Qwen 3

Qwen 2.5

Hammer 2.1

SmolLM 2

Phi 4 Mini

LFM 2.5

Model Selection Guide

By Device Capability

By Use Case

Understanding Quantization

Custom Models