Skip to main content
The Generator class manages the text generation process. It handles the iterative generation of tokens and provides access to generated sequences.

Constructor

Generator(Model model, GeneratorParams generatorParams)

Creates a new generator instance.
model
Model
required
The model to use for generation
generatorParams
GeneratorParams
required
Parameters controlling the generation process
Generator.cs:13-16
using Microsoft.ML.OnnxRuntimeGenAI;

using Model model = new Model("/path/to/model");
using GeneratorParams generatorParams = new GeneratorParams(model);
using Generator generator = new Generator(model, generatorParams);

Generation Control Methods

GenerateNextToken()

Generates the next token in the sequence. This is the core method for the generation loop.
Generator.cs:58-61
while (!generator.IsDone())
{
    generator.GenerateNextToken();
}

IsDone()

Checks if generation is complete (end-of-sequence token generated or max length reached). Returns: bool - true if generation is complete, false otherwise
Generator.cs:18-21
if (generator.IsDone())
{
    Console.WriteLine("Generation complete");
}

RewindTo(ulong newLength)

Rewinds the generator to a specific token length. Useful for chat scenarios where you want to keep the system prompt but remove the conversation history.
newLength
ulong
required
The token length to rewind to
Generator.cs:68-71
// Save system prompt length
ulong systemPromptLength = generator.TokenCount();

// ... generate response ...

// Rewind to system prompt for next turn
generator.RewindTo(systemPromptLength);

Input Methods

AppendTokens(ReadOnlySpan<int> inputIDs)

Appends token IDs to the generator’s input.
inputIDs
ReadOnlySpan<int>
required
The token IDs to append
Generator.cs:33-42
ReadOnlySpan<int> tokens = new int[] { 1, 2, 3, 4 };
generator.AppendTokens(tokens);

AppendTokenSequences(Sequences sequences)

Appends encoded sequences to the generator’s input.
sequences
Sequences
required
The sequences to append (typically from Tokenizer.Encode())
Generator.cs:44-47
using Sequences sequences = tokenizer.Encode("Hello, world!");
generator.AppendTokenSequences(sequences);

SetModelInput(string name, Tensor value)

Sets a specific model input tensor.
name
string
required
The name of the input
value
Tensor
required
The tensor value to set
Generator.cs:23-26
Tensor inputTensor = /* create tensor */;
generator.SetModelInput("custom_input", inputTensor);

SetInputs(NamedTensors namedTensors)

Sets multiple model inputs at once.
namedTensors
NamedTensors
required
Collection of named tensors to set as inputs
Generator.cs:28-31
using NamedTensors inputs = processor.ProcessImagesAndAudios(prompt, images, audios);
generator.SetInputs(inputs);

Output Methods

GetNextTokens()

Returns the tokens generated in the last GenerateNextToken() call. Returns: ReadOnlySpan<int> - The most recently generated tokens
Generator.cs:73-80
generator.GenerateNextToken();
ReadOnlySpan<int> newTokens = generator.GetNextTokens();

GetSequence(ulong index)

Returns the complete token sequence for a specific sequence index.
index
ulong
required
The sequence index (0 for single sequence generation)
Returns: ReadOnlySpan<int> - The complete token sequence
Generator.cs:82-90
// Get the complete output sequence
var outputSequence = generator.GetSequence(0);
string outputText = tokenizer.Decode(outputSequence);

TokenCount()

Returns the total number of tokens in the generator (including input and generated tokens). Returns: ulong - The total token count
Generator.cs:53-56
ulong totalTokens = generator.TokenCount();
Console.WriteLine($"Total tokens: {totalTokens}");

Tensor Access Methods

GetInput(string inputName)

Retrieves an input tensor by name.
inputName
string
required
The name of the input tensor
Returns: Tensor - The input tensor
Generator.cs:98-104
using Tensor inputTensor = generator.GetInput("input_ids");

GetOutput(string outputName)

Retrieves an output tensor by name.
outputName
string
required
The name of the output tensor
Returns: Tensor - The output tensor
Generator.cs:112-118
using Tensor outputTensor = generator.GetOutput("logits");

Adapter Methods

SetActiveAdapter(Adapters adapters, string adapterName)

Activates a previously loaded adapter (for LoRA/fine-tuned models).
adapters
Adapters
required
The adapters container
adapterName
string
required
The name of the adapter to activate
Generator.cs:126-131
using Adapters adapters = /* load adapters */;
generator.SetActiveAdapter(adapters, "my_adapter");

Complete Generation Example

Here’s a complete example of the generation loop:
examples/csharp/ModelChat/Program.cs:54-84
using Microsoft.ML.OnnxRuntimeGenAI;
using System.Diagnostics;

string modelPath = "/path/to/model";

// Load model
using Model model = new Model(modelPath);
using Tokenizer tokenizer = new Tokenizer(model);

// Prepare input
string prompt = "Once upon a time";
using var sequences = tokenizer.Encode(prompt);

// Create generator params
using GeneratorParams generatorParams = new GeneratorParams(model);
generatorParams.SetSearchOption("max_length", 200);
generatorParams.SetSearchOption("temperature", 0.7);

// Create generator and append input tokens
using Generator generator = new Generator(model, generatorParams);
generator.AppendTokenSequences(sequences);

// Run generation loop
var watch = Stopwatch.StartNew();
while (!generator.IsDone())
{
    generator.GenerateNextToken();
}
watch.Stop();

// Get output and decode
var outputSequence = generator.GetSequence(0);
string outputString = tokenizer.Decode(outputSequence);

// Display results
Console.WriteLine("Output:");
Console.WriteLine(outputString);

var totalTokens = (int)generator.TokenCount();
var tokensPerSecond = totalTokens / watch.Elapsed.TotalSeconds;
Console.WriteLine($"Tokens: {totalTokens}, Time: {watch.Elapsed.TotalSeconds:0.00}s, Tokens/sec: {tokensPerSecond:0.00}");

Streaming Generation Example

examples/csharp/ModelChat/Program.cs:200-214
using Microsoft.ML.OnnxRuntimeGenAI;

using Model model = new Model("/path/to/model");
using Tokenizer tokenizer = new Tokenizer(model);
using TokenizerStream tokenizerStream = tokenizer.CreateStream();

// ... setup generator ...

Console.Write("Output: ");
var watch = Stopwatch.StartNew();
while (!generator.IsDone())
{
    generator.GenerateNextToken();
    // Decode and print each token immediately
    Console.Write(tokenizerStream.Decode(generator.GetNextTokens()[0]));
}
watch.Stop();
Console.WriteLine();

var totalTokens = (int)generator.TokenCount();
Console.WriteLine($"Tokens: {totalTokens}, Time: {watch.Elapsed.TotalSeconds:0.00}s");

Chat with Rewind Example

examples/csharp/ModelChat/Program.cs:314-382
using Microsoft.ML.OnnxRuntimeGenAI;

using Model model = new Model("/path/to/model");
using Tokenizer tokenizer = new Tokenizer(model);
using TokenizerStream tokenizerStream = tokenizer.CreateStream();

// Create generator params
using GeneratorParams generatorParams = new GeneratorParams(model);
using Generator generator = new Generator(model, generatorParams);

// Encode system prompt
string systemPrompt = "You are a helpful AI assistant.";
var sequences = tokenizer.Encode(systemPrompt);
generator.AppendTokenSequences(sequences);
var systemPromptLength = (int)generator.TokenCount();

// Chat loop
while (true)
{
    // Get user input
    Console.Write("User: ");
    string userPrompt = Console.ReadLine();
    if (userPrompt == "quit()") break;

    // Encode and append user prompt
    sequences = tokenizer.Encode(userPrompt);
    generator.AppendTokenSequences(sequences);

    // Generate response
    Console.Write("Assistant: ");
    while (!generator.IsDone())
    {
        generator.GenerateNextToken();
        Console.Write(tokenizerStream.Decode(generator.GetNextTokens()[0]));
    }
    Console.WriteLine();

    // Rewind to system prompt for next turn
    generator.RewindTo((ulong)systemPromptLength);
}

See Also

Build docs developers (and LLMs) love