Skip to main content

Overview

LLMModule provides a class-based interface for managing Large Language Model instances. It handles model loading, message management, text generation, and conversation context.

When to Use

Use LLMModule when:
  • You need fine-grained control over the model lifecycle
  • You’re working outside React components
  • You need to manage multiple model instances programmatically
  • You want to integrate LLM capabilities into non-React code
Use useLLM hook when:
  • Building React components
  • You want automatic lifecycle management
  • You prefer declarative state management
  • You need React state integration

Constructor

new LLMModule({
  tokenCallback?: (token: string) => void;
  messageHistoryCallback?: (messageHistory: Message[]) => void;
})
Creates a new LLM module instance with optional callbacks.

Parameters

tokenCallback
(token: string) => void
Optional function called on every generated token with that token as its argument.
messageHistoryCallback
(messageHistory: Message[]) => void
Optional function called on every finished message, returning the entire message history.

Example

import { LLMModule } from 'react-native-executorch';

const llm = new LLMModule({
  tokenCallback: (token) => {
    console.log('New token:', token);
  },
  messageHistoryCallback: (history) => {
    console.log('Updated history:', history);
  }
});

Methods

load()

async load(
  model: {
    modelSource: ResourceSource;
    tokenizerSource: ResourceSource;
    tokenizerConfigSource: ResourceSource;
  },
  onDownloadProgressCallback?: (progress: number) => void
): Promise<void>
Loads the LLM model and tokenizer.

Parameters

model.modelSource
ResourceSource
required
Resource location of the model binary.
model.tokenizerSource
ResourceSource
required
Resource pointing to the tokenizer JSON file.
model.tokenizerConfigSource
ResourceSource
required
Resource pointing to the tokenizer config JSON file.
onDownloadProgressCallback
(progress: number) => void
Optional callback to track download progress (value between 0 and 1).

Example

await llm.load({
  modelSource: 'https://example.com/model.pte',
  tokenizerSource: 'https://example.com/tokenizer.json',
  tokenizerConfigSource: 'https://example.com/tokenizer_config.json'
}, (progress) => {
  console.log(`Download progress: ${(progress * 100).toFixed(1)}%`);
});

configure()

configure(config: LLMConfig): void
Configures chat, tool calling, and generation settings.

Parameters

config
LLMConfig
required
Configuration object containing chatConfig, toolsConfig, and generationConfig.

Example

llm.configure({
  chatConfig: {
    systemPrompt: 'You are a helpful assistant.'
  },
  generationConfig: {
    temperature: 0.7,
    topP: 0.9,
    maxTokens: 512
  }
});

forward()

async forward(input: string): Promise<string>
Runs model inference with raw input string. You need to provide the entire conversation and prompt (in correct format with special tokens). This method doesn’t manage conversation context.

Parameters

input
string
required
Raw input string containing the prompt and conversation history.

Returns

The generated response as a string.

Example

const response = await llm.forward('<|begin_of_text|>Hello, how are you?<|eot_id|>');
console.log(response);

generate()

async generate(messages: Message[], tools?: LLMTool[]): Promise<string>
Runs the model to complete the chat passed in the messages argument. It doesn’t manage conversation context.

Parameters

messages
Message[]
required
Array of messages representing the chat history.
tools
LLMTool[]
Optional array of tools that can be used during generation.

Returns

The generated response as a string.

Example

const response = await llm.generate([
  { role: 'user', content: 'What is the capital of France?' }
]);
console.log(response); // "The capital of France is Paris."

sendMessage()

async sendMessage(message: string): Promise<Message[]>
Adds a user message to the conversation. After the model responds, it calls messageHistoryCallback() with both the user message and model response.

Parameters

message
string
required
The message string to send.

Returns

Updated message history including the new user message and model response.

Example

const history = await llm.sendMessage('Tell me a joke');
console.log(history);
// [
//   { role: 'user', content: 'Tell me a joke' },
//   { role: 'assistant', content: 'Why did the chicken cross...' }
// ]

deleteMessage()

deleteMessage(index: number): Message[]
Deletes all messages starting with the message at the specified index position. After deletion, it calls messageHistoryCallback() with the new history.

Parameters

index
number
required
The index of the message to delete from history.

Returns

Updated message history after deletion.

Example

const newHistory = llm.deleteMessage(2);
console.log(newHistory); // History with messages from index 2 onwards removed

interrupt()

interrupt(): void
Interrupts model generation. May return one more token after interruption.

Example

llm.interrupt();

setTokenCallback()

setTokenCallback({ tokenCallback }: { tokenCallback: (token: string) => void }): void
Sets a new token callback invoked on every token batch.

Parameters

tokenCallback
(token: string) => void
required
Callback function to handle new tokens.

Example

llm.setTokenCallback({
  tokenCallback: (token) => console.log('Token:', token)
});

getGeneratedTokenCount()

getGeneratedTokenCount(): number
Returns the number of tokens generated in the last response.

Returns

The count of generated tokens.

getPromptTokensCount()

getPromptTokensCount(): number
Returns the number of prompt tokens in the last message.

Returns

The count of prompt tokens.

getTotalTokensCount()

getTotalTokensCount(): number
Returns the total number of tokens from the previous generation (sum of prompt and generated tokens).

Returns

The count of prompt and generated tokens.

delete()

delete(): void
Deletes the model from memory. You cannot delete the model while it’s generating - you need to interrupt it first.

Example

llm.interrupt();
// Wait for generation to stop
llm.delete();

Complete Example

import { LLMModule } from 'react-native-executorch';

// Create instance
const llm = new LLMModule({
  tokenCallback: (token) => {
    process.stdout.write(token);
  },
  messageHistoryCallback: (history) => {
    console.log('\nConversation updated:', history.length, 'messages');
  }
});

// Load model
await llm.load({
  modelSource: 'https://example.com/llama-3.2-1B.pte',
  tokenizerSource: 'https://example.com/tokenizer.json',
  tokenizerConfigSource: 'https://example.com/tokenizer_config.json'
}, (progress) => {
  console.log(`Loading: ${(progress * 100).toFixed(0)}%`);
});

// Configure
llm.configure({
  chatConfig: {
    systemPrompt: 'You are a helpful coding assistant.'
  },
  generationConfig: {
    temperature: 0.7,
    maxTokens: 256
  }
});

// Send messages
const history = await llm.sendMessage('Explain React hooks');

// Check token usage
console.log('Tokens used:', llm.getTotalTokensCount());

// Clean up
llm.delete();

See Also

Build docs developers (and LLMs) love