useLLM hook with the Llama 3.2 1B model.
Prerequisites
Before starting, make sure you have:- Completed the installation steps
- Initialized ExecuTorch with a resource fetcher
- A React Native project with the New Architecture enabled
Step 1: Initialize ExecuTorch
First, initialize ExecuTorch in your app’s entry point (e.g.,App.tsx, _layout.tsx, or index.tsx):
Call
initExecutorch() once at the top level of your app, before rendering any components that use AI models.Step 2: Create a Chat Component
Create a new component that uses theuseLLM hook to interact with an LLM:
ChatScreen.tsx
Step 3: Understanding the Code
Let’s break down the key parts of the implementation:Model Initialization
useLLM hook:
- Automatically downloads the model on first use
- Loads the model into memory
- Returns an interface to interact with the LLM
Available Model Constants
React Native ExecuTorch provides pre-configured model constants:State Properties
TheuseLLM hook returns several useful properties:
| Property | Type | Description |
|---|---|---|
isReady | boolean | true when model is loaded and ready |
isGenerating | boolean | true while model is generating a response |
downloadProgress | number | Download progress (0 to 1) |
messageHistory | Message[] | Array of all conversation messages |
response | string | Current response being generated |
token | string | Most recently generated token |
error | RnExecutorchError | null | Error if model failed to load |
Methods
| Method | Description |
|---|---|
sendMessage(message) | Add user message and get AI response |
generate(messages, tools?) | Generate completion for message array |
interrupt() | Stop current generation |
deleteMessage(index) | Remove message from history |
configure(config) | Update model configuration |
getGeneratedTokenCount() | Get count of generated tokens |
getPromptTokenCount() | Get count of prompt tokens |
getTotalTokenCount() | Get total token count |
Step 4: Advanced Usage
Using generate() for One-Off Completions
If you don’t need conversation history management, use generate():
Configuring the Model
Customize the model’s behavior withconfigure():
Handling Errors
Interrupting Generation
Step 5: Testing Your App
Wait for model download
On first launch, the model will be downloaded. This may take a few minutes depending on your connection.
Performance Tips
- Model Selection
- Memory Management
- Optimization
Choose the right model for your use case:
- SpinQuant variants: Smallest size, fastest inference, good quality
- QLoRA variants: Balanced size and quality
- Original models: Highest quality, largest size
LLAMA3_2_1B_SPINQUANT for best performance.Type Definitions
For TypeScript users, here are the key type definitions:Next Steps
Congratulations! You’ve built your first AI-powered chat application.Explore Other Models
Try computer vision, speech-to-text, and other AI capabilities
Advanced Configuration
Learn about context strategies, tool calling, and structured outputs
View Demo Apps
Explore full-featured example applications
API Reference
Dive deep into all available hooks and modules