Overview
The useLLM hook manages a Large Language Model (LLM) instance for text generation and chat applications. It handles model loading, conversation management, token generation, and provides methods for configuration and inference.
Import
import { useLLM } from 'react-native-executorch' ;
Hook Signature
const llm = useLLM ({ model , preventLoad }: LLMProps ): LLMType
Parameters
Object containing model sources Source location of the model binary file (.pte)
Source location of the tokenizer JSON file
Optional source location of the tokenizer config JSON file
If true, prevents automatic model loading and downloading when the hook mounts
Return Value
Returns an object with the following properties and methods:
State Properties
Array of all messages in the conversation. Updated after each model response.
Current generated response. Updated with each token generated by the model.
The most recently generated token.
Indicates whether the model is loaded and ready for inference.
Indicates whether the model is currently generating a response.
Download progress as a value between 0 and 1.
Contains error details if the model fails to load or encounters an error during inference.
Methods
Configures chat and tool calling settings. configure ( config : LLMConfig ): void
Chat management configuration System instructions for the model (e.g., “Be a helpful translator”)
Initial conversation history to provide context
Strategy for managing conversation context window
Tool calling configuration (requires model support) Array of tool definitions
executeToolCallback
(call: ToolCall) => Promise<string | null>
Callback to execute tools and return results to the model
If true, JSON tool calls are displayed in chat
Text generation configuration Controls randomness/creativity (higher = more random)
Nucleus sampling threshold
Soft limit on tokens per batch
Time interval between token batches (ms)
Generates text completion for the provided messages without managing conversation context. generate ( messages : Message [], tools ?: LLMTool []): Promise < string >
Array of messages representing the chat history
Optional array of tools for the model to use
Returns a promise that resolves to the generated text.
Sends a user message and manages conversation context automatically. sendMessage ( message : string ): Promise < string >
Returns a promise that resolves to the model’s response. Updates messageHistory with both the user message and model response.
Deletes all messages starting from the specified index. deleteMessage ( index : number ): void
Index of the message to delete
Updates messageHistory after deletion.
Interrupts the current text generation.
Returns the number of tokens generated in the current generation. getGeneratedTokenCount (): number
Returns the number of prompt tokens in the last message. getPromptTokenCount (): number
Returns the total number of tokens (prompt + generated) from the previous generation. getTotalTokenCount (): number
Usage Examples
Basic Chat Application
import { useLLM } from 'react-native-executorch' ;
import { useState } from 'react' ;
function ChatScreen () {
const [ input , setInput ] = useState ( '' );
const llm = useLLM ({
model: {
modelSource: 'https://huggingface.co/.../model.pte' ,
tokenizerSource: 'https://huggingface.co/.../tokenizer.json' ,
},
});
const handleSend = async () => {
if ( ! input . trim () || ! llm . isReady ) return ;
try {
await llm . sendMessage ( input );
setInput ( '' );
} catch ( error ) {
console . error ( 'Generation failed:' , error );
}
};
return (
< View >
< Text > Status : { llm . isReady ? 'Ready' : 'Loading...' }</ Text >
< Text > Progress : {(llm.downloadProgress * 100). toFixed (0)}%</ Text >
< ScrollView >
{ llm . messageHistory . map (( msg , idx ) => (
< View key = { idx } >
< Text >{msg. role } : { msg . content }</ Text >
</ View >
))}
{ llm . isGenerating && (
< View >
< Text > assistant : { llm . response }</ Text >
</ View >
)}
</ ScrollView >
< TextInput
value = { input }
onChangeText = { setInput }
placeholder = "Type a message..."
/>
< Button title = "Send" onPress = { handleSend } disabled = {!llm. isReady } />
</ View >
);
}
import { useLLM } from 'react-native-executorch' ;
import { useEffect } from 'react' ;
function TranslatorApp () {
const llm = useLLM ({
model: {
modelSource: require ( './models/llama-3.2-1b.pte' ),
tokenizerSource: require ( './models/tokenizer.json' ),
},
});
useEffect (() => {
if ( llm . isReady ) {
llm . configure ({
chatConfig: {
systemPrompt: 'You are a helpful translator. Translate user messages to French.' ,
initialMessageHistory: [],
},
generationConfig: {
temperature: 0.7 ,
topp: 0.9 ,
},
});
}
}, [ llm . isReady ]);
return (
< View >
{ /* UI implementation */ }
</ View >
);
}
Direct Generation (No Context)
import { useLLM } from 'react-native-executorch' ;
function SummarizationTool () {
const llm = useLLM ({
model: {
modelSource: 'https://example.com/model.pte' ,
tokenizerSource: 'https://example.com/tokenizer.json' ,
},
});
const summarize = async ( text : string ) => {
const messages = [
{ role: 'system' , content: 'Summarize the following text concisely.' },
{ role: 'user' , content: text },
];
const summary = await llm . generate ( messages );
return summary ;
};
return (
< View >
{ /* UI implementation */ }
</ View >
);
}
Streaming Tokens
import { useLLM } from 'react-native-executorch' ;
import { useEffect } from 'react' ;
function StreamingChat () {
const llm = useLLM ({
model: {
modelSource: require ( './models/model.pte' ),
tokenizerSource: require ( './models/tokenizer.json' ),
},
});
// Display each token as it's generated
useEffect (() => {
if ( llm . token ) {
console . log ( 'New token:' , llm . token );
}
}, [ llm . token ]);
return (
< View >
< Text > Current response : { llm . response }</ Text >
</ View >
);
}
Types
Message
interface Message {
role : 'user' | 'assistant' | 'system' ;
content : string ;
}
interface ToolCall {
toolName : string ;
arguments : Object ;
}
ContextStrategy
interface ContextStrategy {
buildContext (
systemPrompt : string ,
history : Message [],
maxContextLength : number ,
getTokenCount : ( messages : Message []) => number
) : Message [];
}
Notes
The hook automatically loads the model when mounted unless preventLoad is set to true.
The model and tokenizer files can be large. Monitor downloadProgress to provide user feedback during initial download.
Use getTokenCount methods to monitor token usage and optimize context management for your use case.
See Also