Quickstart

This guide walks you through creating a simple chat application using the useLLM hook with the Llama 3.2 1B model.

Prerequisites

Before starting, make sure you have:

Completed the installation steps
Initialized ExecuTorch with a resource fetcher
A React Native project with the New Architecture enabled

Step 1: Initialize ExecuTorch

First, initialize ExecuTorch in your app’s entry point (e.g., App.tsx, _layout.tsx, or index.tsx):

import { initExecutorch } from 'react-native-executorch';
import { ExpoResourceFetcher } from '@react-native-executorch/expo-resource-fetcher';

initExecutorch({
  resourceFetcher: ExpoResourceFetcher,
});

export default function Layout() {
  // Your app layout
}

Call initExecutorch() once at the top level of your app, before rendering any components that use AI models.

Step 2: Create a Chat Component

Create a new component that uses the useLLM hook to interact with an LLM:

ChatScreen.tsx

import { useState } from 'react';
import {
  View,
  TextInput,
  TouchableOpacity,
  Text,
  FlatList,
  StyleSheet,
  ActivityIndicator,
} from 'react-native';
import {
  useLLM,
  LLAMA3_2_1B,
  Message,
} from 'react-native-executorch';

export default function ChatScreen() {
  const [userInput, setUserInput] = useState('');
  
  // Initialize the LLM with Llama 3.2 1B model
  const llm = useLLM({ model: LLAMA3_2_1B });

  const sendMessage = async () => {
    if (!userInput.trim()) return;
    
    const message = userInput;
    setUserInput(''); // Clear input
    
    try {
      // Send message and get response
      await llm.sendMessage(message);
    } catch (error) {
      console.error('Error sending message:', error);
    }
  };

  // Show loading indicator while model is downloading/loading
  if (!llm.isReady) {
    return (
      <View style={styles.loadingContainer}>
        <ActivityIndicator size="large" />
        <Text style={styles.loadingText}>
          Loading model... {(llm.downloadProgress * 100).toFixed(0)}%
        </Text>
      </View>
    );
  }

  return (
    <View style={styles.container}>
      {/* Chat messages */}
      <FlatList
        data={llm.messageHistory}
        keyExtractor={(_, index) => index.toString()}
        renderItem={({ item }) => (
          <View
            style={[
              styles.messageBubble,
              item.role === 'user' ? styles.userMessage : styles.assistantMessage,
            ]}
          >
            <Text style={styles.roleText}>{item.role}</Text>
            <Text style={styles.messageText}>{item.content}</Text>
          </View>
        )}
        ListFooterComponent={
          llm.isGenerating ? (
            <View style={styles.messageBubble}>
              <Text style={styles.roleText}>assistant</Text>
              <Text style={styles.messageText}>{llm.response}</Text>
            </View>
          ) : null
        }
      />

      {/* Input area */}
      <View style={styles.inputContainer}>
        <TextInput
          style={styles.input}
          value={userInput}
          onChangeText={setUserInput}
          placeholder="Type your message..."
          multiline
          editable={!llm.isGenerating}
        />
        <TouchableOpacity
          style={[
            styles.sendButton,
            (!userInput.trim() || llm.isGenerating) && styles.sendButtonDisabled,
          ]}
          onPress={sendMessage}
          disabled={!userInput.trim() || llm.isGenerating}
        >
          <Text style={styles.sendButtonText}>
            {llm.isGenerating ? '...' : 'Send'}
          </Text>
        </TouchableOpacity>
      </View>
    </View>
  );
}

const styles = StyleSheet.create({
  container: {
    flex: 1,
    backgroundColor: '#fff',
  },
  loadingContainer: {
    flex: 1,
    justifyContent: 'center',
    alignItems: 'center',
  },
  loadingText: {
    marginTop: 16,
    fontSize: 16,
    color: '#666',
  },
  messageBubble: {
    padding: 12,
    marginVertical: 4,
    marginHorizontal: 8,
    borderRadius: 8,
    backgroundColor: '#f0f0f0',
  },
  userMessage: {
    backgroundColor: '#007AFF',
    alignSelf: 'flex-end',
  },
  assistantMessage: {
    backgroundColor: '#E9E9EB',
    alignSelf: 'flex-start',
  },
  roleText: {
    fontSize: 12,
    fontWeight: 'bold',
    marginBottom: 4,
    color: '#666',
  },
  messageText: {
    fontSize: 16,
    color: '#000',
  },
  inputContainer: {
    flexDirection: 'row',
    padding: 8,
    borderTopWidth: 1,
    borderTopColor: '#ddd',
  },
  input: {
    flex: 1,
    borderWidth: 1,
    borderColor: '#ddd',
    borderRadius: 8,
    padding: 12,
    marginRight: 8,
    maxHeight: 100,
  },
  sendButton: {
    backgroundColor: '#007AFF',
    borderRadius: 8,
    padding: 12,
    justifyContent: 'center',
    minWidth: 60,
  },
  sendButtonDisabled: {
    backgroundColor: '#ccc',
  },
  sendButtonText: {
    color: '#fff',
    fontWeight: 'bold',
    textAlign: 'center',
  },
});

Step 3: Understanding the Code

Let’s break down the key parts of the implementation:

Model Initialization

const llm = useLLM({ model: LLAMA3_2_1B });

The useLLM hook:

Automatically downloads the model on first use
Loads the model into memory
Returns an interface to interact with the LLM

Available Model Constants

React Native ExecuTorch provides pre-configured model constants:

import {
  LLAMA3_2_1B,           // 1B parameters
  LLAMA3_2_1B_QLORA,     // 1B QLoRA variant
  LLAMA3_2_1B_SPINQUANT, // 1B SpinQuant (smaller, faster)
  LLAMA3_2_3B,           // 3B parameters
  LLAMA3_2_3B_QLORA,     // 3B QLoRA variant
  LLAMA3_2_3B_SPINQUANT, // 3B SpinQuant
} from 'react-native-executorch';

State Properties

The useLLM hook returns several useful properties:

Property	Type	Description
`isReady`	`boolean`	`true` when model is loaded and ready
`isGenerating`	`boolean`	`true` while model is generating a response
`downloadProgress`	`number`	Download progress (0 to 1)
`messageHistory`	`Message[]`	Array of all conversation messages
`response`	`string`	Current response being generated
`token`	`string`	Most recently generated token
`error`	`RnExecutorchError \| null`	Error if model failed to load

Methods

Method	Description
`sendMessage(message)`	Add user message and get AI response
`generate(messages, tools?)`	Generate completion for message array
`interrupt()`	Stop current generation
`deleteMessage(index)`	Remove message from history
`configure(config)`	Update model configuration
`getGeneratedTokenCount()`	Get count of generated tokens
`getPromptTokenCount()`	Get count of prompt tokens
`getTotalTokenCount()`	Get total token count

Step 4: Advanced Usage

Using `generate()` for One-Off Completions

If you don’t need conversation history management, use generate():

const chat: Message[] = [
  { role: 'system', content: 'You are a helpful assistant' },
  { role: 'user', content: 'What is the capital of France?' },
];

const response = await llm.generate(chat);
console.log('Response:', response);

Configuring the Model

Customize the model’s behavior with configure():

import { useLLM, LLAMA3_2_1B } from 'react-native-executorch';

function MyComponent() {
  const llm = useLLM({ model: LLAMA3_2_1B });

  useEffect(() => {
    if (llm.isReady) {
      llm.configure({
        chatConfig: {
          systemPrompt: 'You are a helpful coding assistant',
          initialMessageHistory: [],
        },
        generationConfig: {
          temperature: 0.7,  // Control randomness (0-1)
          topp: 0.9,         // Nucleus sampling
          outputTokenBatchSize: 3,
          batchTimeInterval: 50,
        },
      });
    }
  }, [llm.isReady]);

  // ...
}

Handling Errors

const llm = useLLM({ model: LLAMA3_2_1B });

useEffect(() => {
  if (llm.error) {
    console.error('LLM Error:', llm.error.message);
    console.error('Error Code:', llm.error.code);
  }
}, [llm.error]);

Interrupting Generation

<TouchableOpacity
  onPress={llm.interrupt}
  disabled={!llm.isGenerating}
>
  <Text>Stop Generation</Text>
</TouchableOpacity>

Step 5: Testing Your App

Run the app

# Expo
npx expo run:ios
# or
npx expo run:android

# Bare React Native
npx react-native run-ios
# or
npx react-native run-android

Wait for model download

On first launch, the model will be downloaded. This may take a few minutes depending on your connection.

Send a message

Type a message in the input field and press Send. The model will generate a response!

High RAM Usage: Running LLMs requires significant memory. If your app crashes unexpectedly, try:

Using a smaller model (e.g., LLAMA3_2_1B_SPINQUANT instead of LLAMA3_2_3B)
Increasing emulator RAM allocation
Testing on a physical device with more memory

Performance Tips

Model Selection
Memory Management
Optimization

Choose the right model for your use case:

SpinQuant variants: Smallest size, fastest inference, good quality
QLoRA variants: Balanced size and quality
Original models: Highest quality, largest size

Start with LLAMA3_2_1B_SPINQUANT for best performance.

Only load one model at a time
Use preventLoad: true if you want to defer loading
Clean up models when switching screens

Adjust outputTokenBatchSize to control response streaming speed
Use temperature: 0 for deterministic outputs
Implement context strategies to manage conversation length

Type Definitions

For TypeScript users, here are the key type definitions:

interface Message {
  role: 'user' | 'assistant' | 'system';
  content: string;
}

interface LLMType {
  messageHistory: Message[];
  response: string;
  token: string;
  isReady: boolean;
  isGenerating: boolean;
  downloadProgress: number;
  error: RnExecutorchError | null;
  configure: (config: LLMConfig) => void;
  generate: (messages: Message[], tools?: LLMTool[]) => Promise<string>;
  sendMessage: (message: string) => Promise<string>;
  deleteMessage: (index: number) => void;
  interrupt: () => void;
  getGeneratedTokenCount: () => number;
  getPromptTokenCount: () => number;
  getTotalTokenCount: () => number;
}

Next Steps

Congratulations! You’ve built your first AI-powered chat application.

Explore Other Models

Try computer vision, speech-to-text, and other AI capabilities

Advanced Configuration

Learn about context strategies, tool calling, and structured outputs

View Demo Apps

Explore full-featured example applications

API Reference

Dive deep into all available hooks and modules

Getting Started

Core Concepts

Large Language Models

Computer Vision

Speech & Audio

Text Embeddings

Advanced

Guides

Prerequisites

Step 1: Initialize ExecuTorch

Step 2: Create a Chat Component

Step 3: Understanding the Code

Model Initialization

Available Model Constants

State Properties

Methods

Step 4: Advanced Usage

Using `generate()` for One-Off Completions

Configuring the Model

Handling Errors

Interrupting Generation

Step 5: Testing Your App

Performance Tips

Type Definitions

Next Steps

Explore Other Models

Advanced Configuration

View Demo Apps

API Reference

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Large Language Models

Computer Vision

Speech & Audio

Text Embeddings

Advanced

Guides

​Prerequisites

​Step 1: Initialize ExecuTorch

​Step 2: Create a Chat Component

​Step 3: Understanding the Code

​Model Initialization

​Available Model Constants

​State Properties

​Methods

​Step 4: Advanced Usage

​Using generate() for One-Off Completions

​Configuring the Model

​Handling Errors

​Interrupting Generation

​Step 5: Testing Your App

​Performance Tips

​Type Definitions

​Next Steps

Explore Other Models

Advanced Configuration

View Demo Apps

API Reference

Build docs developers (and LLMs) love

Prerequisites

Step 1: Initialize ExecuTorch

Step 2: Create a Chat Component

Step 3: Understanding the Code

Model Initialization

Available Model Constants

State Properties

Methods

Step 4: Advanced Usage

Using `generate()` for One-Off Completions

Configuring the Model

Handling Errors

Interrupting Generation

Step 5: Testing Your App

Performance Tips

Type Definitions

Next Steps