Large Language Models (LLM)
React Native ExecuTorch enables you to run powerful Large Language Models directly on mobile devices, providing fast, private, and offline AI capabilities.What is useLLM?
TheuseLLM hook is the primary interface for integrating LLMs into your React Native application. It handles model loading, token streaming, conversation management, and provides real-time generation feedback.
Key Features
- On-Device Inference: Run models entirely on-device with no server dependency
- Token Streaming: Receive generated tokens in real-time as the model generates responses
- Conversation Management: Built-in message history tracking with flexible context strategies
- Tool Calling: Enable models to call external functions and APIs
- Download Progress: Monitor model download progress for better UX
- Multiple Models: Support for various model families (Llama, Qwen, Hammer, SmolLM, Phi)
- Quantization Support: Use quantized models for reduced memory footprint
Quick Start
Core Concepts
Model Loading
Models are automatically downloaded and loaded when the hook initializes. You can monitor progress using thedownloadProgress state (0-1) and check readiness with the isReady boolean.
Message History
ThemessageHistory array tracks the entire conversation, with each message containing a role (user, assistant, or system) and content.
Token Streaming
As the model generates text, you receive:token: The most recent token generatedresponse: The accumulated response stringisGenerating: Whether generation is in progress
Error Handling
Theerror state contains any RnExecutorchError that occurred during model loading or generation.
Next Steps
useLLM Hook
Complete API reference for the useLLM hook
Chat Configuration
Configure system prompts, context, and generation
Tool Calling
Enable models to call external functions
Available Models
Browse all supported models and variants