Skip to main content

Overview

The useLLM hook manages a Large Language Model (LLM) instance for text generation and chat applications. It handles model loading, conversation management, token generation, and provides methods for configuration and inference.

Import

import { useLLM } from 'react-native-executorch';

Hook Signature

const llm = useLLM({ model, preventLoad }: LLMProps): LLMType

Parameters

model
object
required
Object containing model sources
preventLoad
boolean
default:"false"
If true, prevents automatic model loading and downloading when the hook mounts

Return Value

Returns an object with the following properties and methods:

State Properties

messageHistory
Message[]
Array of all messages in the conversation. Updated after each model response.
response
string
Current generated response. Updated with each token generated by the model.
token
string
The most recently generated token.
isReady
boolean
Indicates whether the model is loaded and ready for inference.
isGenerating
boolean
Indicates whether the model is currently generating a response.
downloadProgress
number
Download progress as a value between 0 and 1.
error
RnExecutorchError | null
Contains error details if the model fails to load or encounters an error during inference.

Methods

configure
function
Configures chat and tool calling settings.
configure(config: LLMConfig): void
generate
function
Generates text completion for the provided messages without managing conversation context.
generate(messages: Message[], tools?: LLMTool[]): Promise<string>
Returns a promise that resolves to the generated text.
sendMessage
function
Sends a user message and manages conversation context automatically.
sendMessage(message: string): Promise<string>
Returns a promise that resolves to the model’s response. Updates messageHistory with both the user message and model response.
deleteMessage
function
Deletes all messages starting from the specified index.
deleteMessage(index: number): void
Updates messageHistory after deletion.
interrupt
function
Interrupts the current text generation.
interrupt(): void
getGeneratedTokenCount
function
Returns the number of tokens generated in the current generation.
getGeneratedTokenCount(): number
getPromptTokenCount
function
Returns the number of prompt tokens in the last message.
getPromptTokenCount(): number
getTotalTokenCount
function
Returns the total number of tokens (prompt + generated) from the previous generation.
getTotalTokenCount(): number

Usage Examples

Basic Chat Application

import { useLLM } from 'react-native-executorch';
import { useState } from 'react';

function ChatScreen() {
  const [input, setInput] = useState('');
  
  const llm = useLLM({
    model: {
      modelSource: 'https://huggingface.co/.../model.pte',
      tokenizerSource: 'https://huggingface.co/.../tokenizer.json',
    },
  });
  
  const handleSend = async () => {
    if (!input.trim() || !llm.isReady) return;
    
    try {
      await llm.sendMessage(input);
      setInput('');
    } catch (error) {
      console.error('Generation failed:', error);
    }
  };
  
  return (
    <View>
      <Text>Status: {llm.isReady ? 'Ready' : 'Loading...'}</Text>
      <Text>Progress: {(llm.downloadProgress * 100).toFixed(0)}%</Text>
      
      <ScrollView>
        {llm.messageHistory.map((msg, idx) => (
          <View key={idx}>
            <Text>{msg.role}: {msg.content}</Text>
          </View>
        ))}
        
        {llm.isGenerating && (
          <View>
            <Text>assistant: {llm.response}</Text>
          </View>
        )}
      </ScrollView>
      
      <TextInput
        value={input}
        onChangeText={setInput}
        placeholder="Type a message..."
      />
      <Button title="Send" onPress={handleSend} disabled={!llm.isReady} />
    </View>
  );
}

Configured LLM with System Prompt

import { useLLM } from 'react-native-executorch';
import { useEffect } from 'react';

function TranslatorApp() {
  const llm = useLLM({
    model: {
      modelSource: require('./models/llama-3.2-1b.pte'),
      tokenizerSource: require('./models/tokenizer.json'),
    },
  });
  
  useEffect(() => {
    if (llm.isReady) {
      llm.configure({
        chatConfig: {
          systemPrompt: 'You are a helpful translator. Translate user messages to French.',
          initialMessageHistory: [],
        },
        generationConfig: {
          temperature: 0.7,
          topp: 0.9,
        },
      });
    }
  }, [llm.isReady]);
  
  return (
    <View>
      {/* UI implementation */}
    </View>
  );
}

Direct Generation (No Context)

import { useLLM } from 'react-native-executorch';

function SummarizationTool() {
  const llm = useLLM({
    model: {
      modelSource: 'https://example.com/model.pte',
      tokenizerSource: 'https://example.com/tokenizer.json',
    },
  });
  
  const summarize = async (text: string) => {
    const messages = [
      { role: 'system', content: 'Summarize the following text concisely.' },
      { role: 'user', content: text },
    ];
    
    const summary = await llm.generate(messages);
    return summary;
  };
  
  return (
    <View>
      {/* UI implementation */}
    </View>
  );
}

Streaming Tokens

import { useLLM } from 'react-native-executorch';
import { useEffect } from 'react';

function StreamingChat() {
  const llm = useLLM({
    model: {
      modelSource: require('./models/model.pte'),
      tokenizerSource: require('./models/tokenizer.json'),
    },
  });
  
  // Display each token as it's generated
  useEffect(() => {
    if (llm.token) {
      console.log('New token:', llm.token);
    }
  }, [llm.token]);
  
  return (
    <View>
      <Text>Current response: {llm.response}</Text>
    </View>
  );
}

Types

Message

interface Message {
  role: 'user' | 'assistant' | 'system';
  content: string;
}

ToolCall

interface ToolCall {
  toolName: string;
  arguments: Object;
}

ContextStrategy

interface ContextStrategy {
  buildContext(
    systemPrompt: string,
    history: Message[],
    maxContextLength: number,
    getTokenCount: (messages: Message[]) => number
  ): Message[];
}

Notes

The hook automatically loads the model when mounted unless preventLoad is set to true.
The model and tokenizer files can be large. Monitor downloadProgress to provide user feedback during initial download.
Use getTokenCount methods to monitor token usage and optimize context management for your use case.

See Also

Build docs developers (and LLMs) love