Overview
LLMModule provides a class-based interface for managing Large Language Model instances. It handles model loading, message management, text generation, and conversation context.
When to Use
Use LLMModule when:
- You need fine-grained control over the model lifecycle
- You’re working outside React components
- You need to manage multiple model instances programmatically
- You want to integrate LLM capabilities into non-React code
Use useLLM hook when:
- Building React components
- You want automatic lifecycle management
- You prefer declarative state management
- You need React state integration
Constructor
new LLMModule({
tokenCallback?: (token: string) => void;
messageHistoryCallback?: (messageHistory: Message[]) => void;
})
Creates a new LLM module instance with optional callbacks.
Parameters
Optional function called on every generated token with that token as its argument.
messageHistoryCallback
(messageHistory: Message[]) => void
Optional function called on every finished message, returning the entire message history.
Example
import { LLMModule } from 'react-native-executorch';
const llm = new LLMModule({
tokenCallback: (token) => {
console.log('New token:', token);
},
messageHistoryCallback: (history) => {
console.log('Updated history:', history);
}
});
Methods
load()
async load(
model: {
modelSource: ResourceSource;
tokenizerSource: ResourceSource;
tokenizerConfigSource: ResourceSource;
},
onDownloadProgressCallback?: (progress: number) => void
): Promise<void>
Loads the LLM model and tokenizer.
Parameters
Resource location of the model binary.
Resource pointing to the tokenizer JSON file.
model.tokenizerConfigSource
Resource pointing to the tokenizer config JSON file.
onDownloadProgressCallback
(progress: number) => void
Optional callback to track download progress (value between 0 and 1).
Example
await llm.load({
modelSource: 'https://example.com/model.pte',
tokenizerSource: 'https://example.com/tokenizer.json',
tokenizerConfigSource: 'https://example.com/tokenizer_config.json'
}, (progress) => {
console.log(`Download progress: ${(progress * 100).toFixed(1)}%`);
});
configure(config: LLMConfig): void
Configures chat, tool calling, and generation settings.
Parameters
Configuration object containing chatConfig, toolsConfig, and generationConfig.
Example
llm.configure({
chatConfig: {
systemPrompt: 'You are a helpful assistant.'
},
generationConfig: {
temperature: 0.7,
topP: 0.9,
maxTokens: 512
}
});
forward()
async forward(input: string): Promise<string>
Runs model inference with raw input string. You need to provide the entire conversation and prompt (in correct format with special tokens). This method doesn’t manage conversation context.
Parameters
Raw input string containing the prompt and conversation history.
Returns
The generated response as a string.
Example
const response = await llm.forward('<|begin_of_text|>Hello, how are you?<|eot_id|>');
console.log(response);
generate()
async generate(messages: Message[], tools?: LLMTool[]): Promise<string>
Runs the model to complete the chat passed in the messages argument. It doesn’t manage conversation context.
Parameters
Array of messages representing the chat history.
Optional array of tools that can be used during generation.
Returns
The generated response as a string.
Example
const response = await llm.generate([
{ role: 'user', content: 'What is the capital of France?' }
]);
console.log(response); // "The capital of France is Paris."
sendMessage()
async sendMessage(message: string): Promise<Message[]>
Adds a user message to the conversation. After the model responds, it calls messageHistoryCallback() with both the user message and model response.
Parameters
The message string to send.
Returns
Updated message history including the new user message and model response.
Example
const history = await llm.sendMessage('Tell me a joke');
console.log(history);
// [
// { role: 'user', content: 'Tell me a joke' },
// { role: 'assistant', content: 'Why did the chicken cross...' }
// ]
deleteMessage()
deleteMessage(index: number): Message[]
Deletes all messages starting with the message at the specified index position. After deletion, it calls messageHistoryCallback() with the new history.
Parameters
The index of the message to delete from history.
Returns
Updated message history after deletion.
Example
const newHistory = llm.deleteMessage(2);
console.log(newHistory); // History with messages from index 2 onwards removed
interrupt()
Interrupts model generation. May return one more token after interruption.
Example
setTokenCallback()
setTokenCallback({ tokenCallback }: { tokenCallback: (token: string) => void }): void
Sets a new token callback invoked on every token batch.
Parameters
tokenCallback
(token: string) => void
required
Callback function to handle new tokens.
Example
llm.setTokenCallback({
tokenCallback: (token) => console.log('Token:', token)
});
getGeneratedTokenCount()
getGeneratedTokenCount(): number
Returns the number of tokens generated in the last response.
Returns
The count of generated tokens.
getPromptTokensCount()
getPromptTokensCount(): number
Returns the number of prompt tokens in the last message.
Returns
The count of prompt tokens.
getTotalTokensCount()
getTotalTokensCount(): number
Returns the total number of tokens from the previous generation (sum of prompt and generated tokens).
Returns
The count of prompt and generated tokens.
delete()
Deletes the model from memory. You cannot delete the model while it’s generating - you need to interrupt it first.
Example
llm.interrupt();
// Wait for generation to stop
llm.delete();
Complete Example
import { LLMModule } from 'react-native-executorch';
// Create instance
const llm = new LLMModule({
tokenCallback: (token) => {
process.stdout.write(token);
},
messageHistoryCallback: (history) => {
console.log('\nConversation updated:', history.length, 'messages');
}
});
// Load model
await llm.load({
modelSource: 'https://example.com/llama-3.2-1B.pte',
tokenizerSource: 'https://example.com/tokenizer.json',
tokenizerConfigSource: 'https://example.com/tokenizer_config.json'
}, (progress) => {
console.log(`Loading: ${(progress * 100).toFixed(0)}%`);
});
// Configure
llm.configure({
chatConfig: {
systemPrompt: 'You are a helpful coding assistant.'
},
generationConfig: {
temperature: 0.7,
maxTokens: 256
}
});
// Send messages
const history = await llm.sendMessage('Explain React hooks');
// Check token usage
console.log('Tokens used:', llm.getTotalTokensCount());
// Clean up
llm.delete();
See Also