Skip to main content
This guide walks you through creating a simple chat application using the useLLM hook with the Llama 3.2 1B model.

Prerequisites

Before starting, make sure you have:
  • Completed the installation steps
  • Initialized ExecuTorch with a resource fetcher
  • A React Native project with the New Architecture enabled

Step 1: Initialize ExecuTorch

First, initialize ExecuTorch in your app’s entry point (e.g., App.tsx, _layout.tsx, or index.tsx):
import { initExecutorch } from 'react-native-executorch';
import { ExpoResourceFetcher } from '@react-native-executorch/expo-resource-fetcher';

initExecutorch({
  resourceFetcher: ExpoResourceFetcher,
});

export default function Layout() {
  // Your app layout
}
Call initExecutorch() once at the top level of your app, before rendering any components that use AI models.

Step 2: Create a Chat Component

Create a new component that uses the useLLM hook to interact with an LLM:
ChatScreen.tsx
import { useState } from 'react';
import {
  View,
  TextInput,
  TouchableOpacity,
  Text,
  FlatList,
  StyleSheet,
  ActivityIndicator,
} from 'react-native';
import {
  useLLM,
  LLAMA3_2_1B,
  Message,
} from 'react-native-executorch';

export default function ChatScreen() {
  const [userInput, setUserInput] = useState('');
  
  // Initialize the LLM with Llama 3.2 1B model
  const llm = useLLM({ model: LLAMA3_2_1B });

  const sendMessage = async () => {
    if (!userInput.trim()) return;
    
    const message = userInput;
    setUserInput(''); // Clear input
    
    try {
      // Send message and get response
      await llm.sendMessage(message);
    } catch (error) {
      console.error('Error sending message:', error);
    }
  };

  // Show loading indicator while model is downloading/loading
  if (!llm.isReady) {
    return (
      <View style={styles.loadingContainer}>
        <ActivityIndicator size="large" />
        <Text style={styles.loadingText}>
          Loading model... {(llm.downloadProgress * 100).toFixed(0)}%
        </Text>
      </View>
    );
  }

  return (
    <View style={styles.container}>
      {/* Chat messages */}
      <FlatList
        data={llm.messageHistory}
        keyExtractor={(_, index) => index.toString()}
        renderItem={({ item }) => (
          <View
            style={[
              styles.messageBubble,
              item.role === 'user' ? styles.userMessage : styles.assistantMessage,
            ]}
          >
            <Text style={styles.roleText}>{item.role}</Text>
            <Text style={styles.messageText}>{item.content}</Text>
          </View>
        )}
        ListFooterComponent={
          llm.isGenerating ? (
            <View style={styles.messageBubble}>
              <Text style={styles.roleText}>assistant</Text>
              <Text style={styles.messageText}>{llm.response}</Text>
            </View>
          ) : null
        }
      />

      {/* Input area */}
      <View style={styles.inputContainer}>
        <TextInput
          style={styles.input}
          value={userInput}
          onChangeText={setUserInput}
          placeholder="Type your message..."
          multiline
          editable={!llm.isGenerating}
        />
        <TouchableOpacity
          style={[
            styles.sendButton,
            (!userInput.trim() || llm.isGenerating) && styles.sendButtonDisabled,
          ]}
          onPress={sendMessage}
          disabled={!userInput.trim() || llm.isGenerating}
        >
          <Text style={styles.sendButtonText}>
            {llm.isGenerating ? '...' : 'Send'}
          </Text>
        </TouchableOpacity>
      </View>
    </View>
  );
}

const styles = StyleSheet.create({
  container: {
    flex: 1,
    backgroundColor: '#fff',
  },
  loadingContainer: {
    flex: 1,
    justifyContent: 'center',
    alignItems: 'center',
  },
  loadingText: {
    marginTop: 16,
    fontSize: 16,
    color: '#666',
  },
  messageBubble: {
    padding: 12,
    marginVertical: 4,
    marginHorizontal: 8,
    borderRadius: 8,
    backgroundColor: '#f0f0f0',
  },
  userMessage: {
    backgroundColor: '#007AFF',
    alignSelf: 'flex-end',
  },
  assistantMessage: {
    backgroundColor: '#E9E9EB',
    alignSelf: 'flex-start',
  },
  roleText: {
    fontSize: 12,
    fontWeight: 'bold',
    marginBottom: 4,
    color: '#666',
  },
  messageText: {
    fontSize: 16,
    color: '#000',
  },
  inputContainer: {
    flexDirection: 'row',
    padding: 8,
    borderTopWidth: 1,
    borderTopColor: '#ddd',
  },
  input: {
    flex: 1,
    borderWidth: 1,
    borderColor: '#ddd',
    borderRadius: 8,
    padding: 12,
    marginRight: 8,
    maxHeight: 100,
  },
  sendButton: {
    backgroundColor: '#007AFF',
    borderRadius: 8,
    padding: 12,
    justifyContent: 'center',
    minWidth: 60,
  },
  sendButtonDisabled: {
    backgroundColor: '#ccc',
  },
  sendButtonText: {
    color: '#fff',
    fontWeight: 'bold',
    textAlign: 'center',
  },
});

Step 3: Understanding the Code

Let’s break down the key parts of the implementation:

Model Initialization

const llm = useLLM({ model: LLAMA3_2_1B });
The useLLM hook:
  • Automatically downloads the model on first use
  • Loads the model into memory
  • Returns an interface to interact with the LLM

Available Model Constants

React Native ExecuTorch provides pre-configured model constants:
import {
  LLAMA3_2_1B,           // 1B parameters
  LLAMA3_2_1B_QLORA,     // 1B QLoRA variant
  LLAMA3_2_1B_SPINQUANT, // 1B SpinQuant (smaller, faster)
  LLAMA3_2_3B,           // 3B parameters
  LLAMA3_2_3B_QLORA,     // 3B QLoRA variant
  LLAMA3_2_3B_SPINQUANT, // 3B SpinQuant
} from 'react-native-executorch';

State Properties

The useLLM hook returns several useful properties:
PropertyTypeDescription
isReadybooleantrue when model is loaded and ready
isGeneratingbooleantrue while model is generating a response
downloadProgressnumberDownload progress (0 to 1)
messageHistoryMessage[]Array of all conversation messages
responsestringCurrent response being generated
tokenstringMost recently generated token
errorRnExecutorchError | nullError if model failed to load

Methods

MethodDescription
sendMessage(message)Add user message and get AI response
generate(messages, tools?)Generate completion for message array
interrupt()Stop current generation
deleteMessage(index)Remove message from history
configure(config)Update model configuration
getGeneratedTokenCount()Get count of generated tokens
getPromptTokenCount()Get count of prompt tokens
getTotalTokenCount()Get total token count

Step 4: Advanced Usage

Using generate() for One-Off Completions

If you don’t need conversation history management, use generate():
const chat: Message[] = [
  { role: 'system', content: 'You are a helpful assistant' },
  { role: 'user', content: 'What is the capital of France?' },
];

const response = await llm.generate(chat);
console.log('Response:', response);

Configuring the Model

Customize the model’s behavior with configure():
import { useLLM, LLAMA3_2_1B } from 'react-native-executorch';

function MyComponent() {
  const llm = useLLM({ model: LLAMA3_2_1B });

  useEffect(() => {
    if (llm.isReady) {
      llm.configure({
        chatConfig: {
          systemPrompt: 'You are a helpful coding assistant',
          initialMessageHistory: [],
        },
        generationConfig: {
          temperature: 0.7,  // Control randomness (0-1)
          topp: 0.9,         // Nucleus sampling
          outputTokenBatchSize: 3,
          batchTimeInterval: 50,
        },
      });
    }
  }, [llm.isReady]);

  // ...
}

Handling Errors

const llm = useLLM({ model: LLAMA3_2_1B });

useEffect(() => {
  if (llm.error) {
    console.error('LLM Error:', llm.error.message);
    console.error('Error Code:', llm.error.code);
  }
}, [llm.error]);

Interrupting Generation

<TouchableOpacity
  onPress={llm.interrupt}
  disabled={!llm.isGenerating}
>
  <Text>Stop Generation</Text>
</TouchableOpacity>

Step 5: Testing Your App

1

Run the app

# Expo
npx expo run:ios
# or
npx expo run:android

# Bare React Native
npx react-native run-ios
# or
npx react-native run-android
2

Wait for model download

On first launch, the model will be downloaded. This may take a few minutes depending on your connection.
3

Send a message

Type a message in the input field and press Send. The model will generate a response!
High RAM Usage: Running LLMs requires significant memory. If your app crashes unexpectedly, try:
  • Using a smaller model (e.g., LLAMA3_2_1B_SPINQUANT instead of LLAMA3_2_3B)
  • Increasing emulator RAM allocation
  • Testing on a physical device with more memory

Performance Tips

Choose the right model for your use case:
  • SpinQuant variants: Smallest size, fastest inference, good quality
  • QLoRA variants: Balanced size and quality
  • Original models: Highest quality, largest size
Start with LLAMA3_2_1B_SPINQUANT for best performance.

Type Definitions

For TypeScript users, here are the key type definitions:
interface Message {
  role: 'user' | 'assistant' | 'system';
  content: string;
}

interface LLMType {
  messageHistory: Message[];
  response: string;
  token: string;
  isReady: boolean;
  isGenerating: boolean;
  downloadProgress: number;
  error: RnExecutorchError | null;
  configure: (config: LLMConfig) => void;
  generate: (messages: Message[], tools?: LLMTool[]) => Promise<string>;
  sendMessage: (message: string) => Promise<string>;
  deleteMessage: (index: number) => void;
  interrupt: () => void;
  getGeneratedTokenCount: () => number;
  getPromptTokenCount: () => number;
  getTotalTokenCount: () => number;
}

Next Steps

Congratulations! You’ve built your first AI-powered chat application.

Explore Other Models

Try computer vision, speech-to-text, and other AI capabilities

Advanced Configuration

Learn about context strategies, tool calling, and structured outputs

View Demo Apps

Explore full-featured example applications

API Reference

Dive deep into all available hooks and modules

Build docs developers (and LLMs) love