Generator

The Generator class manages the text generation process. It handles the iterative generation of tokens and provides access to generated sequences.

Constructor

Generator(Model model, GeneratorParams generatorParams)

Creates a new generator instance.

model

Model

required

The model to use for generation

generatorParams

GeneratorParams

required

Parameters controlling the generation process

Generator.cs:13-16

using Microsoft.ML.OnnxRuntimeGenAI;

using Model model = new Model("/path/to/model");
using GeneratorParams generatorParams = new GeneratorParams(model);
using Generator generator = new Generator(model, generatorParams);

Generation Control Methods

GenerateNextToken()

Generates the next token in the sequence. This is the core method for the generation loop.

Generator.cs:58-61

while (!generator.IsDone())
{
    generator.GenerateNextToken();
}

IsDone()

Checks if generation is complete (end-of-sequence token generated or max length reached). Returns: bool - true if generation is complete, false otherwise

Generator.cs:18-21

if (generator.IsDone())
{
    Console.WriteLine("Generation complete");
}

RewindTo(ulong newLength)

Rewinds the generator to a specific token length. Useful for chat scenarios where you want to keep the system prompt but remove the conversation history.

newLength

ulong

required

The token length to rewind to

Generator.cs:68-71

// Save system prompt length
ulong systemPromptLength = generator.TokenCount();

// ... generate response ...

// Rewind to system prompt for next turn
generator.RewindTo(systemPromptLength);

Input Methods

AppendTokens(ReadOnlySpan<int> inputIDs)

Appends token IDs to the generator’s input.

inputIDs

ReadOnlySpan<int>

required

The token IDs to append

Generator.cs:33-42

ReadOnlySpan<int> tokens = new int[] { 1, 2, 3, 4 };
generator.AppendTokens(tokens);

AppendTokenSequences(Sequences sequences)

Appends encoded sequences to the generator’s input.

sequences

Sequences

required

The sequences to append (typically from Tokenizer.Encode())

Generator.cs:44-47

using Sequences sequences = tokenizer.Encode("Hello, world!");
generator.AppendTokenSequences(sequences);

SetModelInput(string name, Tensor value)

Sets a specific model input tensor.

name

string

required

The name of the input

value

Tensor

required

The tensor value to set

Generator.cs:23-26

Tensor inputTensor = /* create tensor */;
generator.SetModelInput("custom_input", inputTensor);

SetInputs(NamedTensors namedTensors)

Sets multiple model inputs at once.

namedTensors

NamedTensors

required

Collection of named tensors to set as inputs

Generator.cs:28-31

using NamedTensors inputs = processor.ProcessImagesAndAudios(prompt, images, audios);
generator.SetInputs(inputs);

Output Methods

GetNextTokens()

Returns the tokens generated in the last GenerateNextToken() call. Returns: ReadOnlySpan<int> - The most recently generated tokens

Generator.cs:73-80

generator.GenerateNextToken();
ReadOnlySpan<int> newTokens = generator.GetNextTokens();

GetSequence(ulong index)

Returns the complete token sequence for a specific sequence index.

index

ulong

required

The sequence index (0 for single sequence generation)

Returns: ReadOnlySpan<int> - The complete token sequence

Generator.cs:82-90

// Get the complete output sequence
var outputSequence = generator.GetSequence(0);
string outputText = tokenizer.Decode(outputSequence);

TokenCount()

Returns the total number of tokens in the generator (including input and generated tokens). Returns: ulong - The total token count

Generator.cs:53-56

ulong totalTokens = generator.TokenCount();
Console.WriteLine($"Total tokens: {totalTokens}");

Tensor Access Methods

GetInput(string inputName)

Retrieves an input tensor by name.

inputName

string

required

The name of the input tensor

Returns: Tensor - The input tensor

Generator.cs:98-104

using Tensor inputTensor = generator.GetInput("input_ids");

GetOutput(string outputName)

Retrieves an output tensor by name.

outputName

string

required

The name of the output tensor

Returns: Tensor - The output tensor

Generator.cs:112-118

using Tensor outputTensor = generator.GetOutput("logits");

Adapter Methods

SetActiveAdapter(Adapters adapters, string adapterName)

Activates a previously loaded adapter (for LoRA/fine-tuned models).

adapters

Adapters

required

The adapters container

adapterName

string

required

The name of the adapter to activate

Generator.cs:126-131

using Adapters adapters = /* load adapters */;
generator.SetActiveAdapter(adapters, "my_adapter");

Complete Generation Example

Here’s a complete example of the generation loop:

examples/csharp/ModelChat/Program.cs:54-84

using Microsoft.ML.OnnxRuntimeGenAI;
using System.Diagnostics;

string modelPath = "/path/to/model";

// Load model
using Model model = new Model(modelPath);
using Tokenizer tokenizer = new Tokenizer(model);

// Prepare input
string prompt = "Once upon a time";
using var sequences = tokenizer.Encode(prompt);

// Create generator params
using GeneratorParams generatorParams = new GeneratorParams(model);
generatorParams.SetSearchOption("max_length", 200);
generatorParams.SetSearchOption("temperature", 0.7);

// Create generator and append input tokens
using Generator generator = new Generator(model, generatorParams);
generator.AppendTokenSequences(sequences);

// Run generation loop
var watch = Stopwatch.StartNew();
while (!generator.IsDone())
{
    generator.GenerateNextToken();
}
watch.Stop();

// Get output and decode
var outputSequence = generator.GetSequence(0);
string outputString = tokenizer.Decode(outputSequence);

// Display results
Console.WriteLine("Output:");
Console.WriteLine(outputString);

var totalTokens = (int)generator.TokenCount();
var tokensPerSecond = totalTokens / watch.Elapsed.TotalSeconds;
Console.WriteLine($"Tokens: {totalTokens}, Time: {watch.Elapsed.TotalSeconds:0.00}s, Tokens/sec: {tokensPerSecond:0.00}");

Streaming Generation Example

examples/csharp/ModelChat/Program.cs:200-214

using Microsoft.ML.OnnxRuntimeGenAI;

using Model model = new Model("/path/to/model");
using Tokenizer tokenizer = new Tokenizer(model);
using TokenizerStream tokenizerStream = tokenizer.CreateStream();

// ... setup generator ...

Console.Write("Output: ");
var watch = Stopwatch.StartNew();
while (!generator.IsDone())
{
    generator.GenerateNextToken();
    // Decode and print each token immediately
    Console.Write(tokenizerStream.Decode(generator.GetNextTokens()[0]));
}
watch.Stop();
Console.WriteLine();

var totalTokens = (int)generator.TokenCount();
Console.WriteLine($"Tokens: {totalTokens}, Time: {watch.Elapsed.TotalSeconds:0.00}s");

Chat with Rewind Example

examples/csharp/ModelChat/Program.cs:314-382

using Microsoft.ML.OnnxRuntimeGenAI;

using Model model = new Model("/path/to/model");
using Tokenizer tokenizer = new Tokenizer(model);
using TokenizerStream tokenizerStream = tokenizer.CreateStream();

// Create generator params
using GeneratorParams generatorParams = new GeneratorParams(model);
using Generator generator = new Generator(model, generatorParams);

// Encode system prompt
string systemPrompt = "You are a helpful AI assistant.";
var sequences = tokenizer.Encode(systemPrompt);
generator.AppendTokenSequences(sequences);
var systemPromptLength = (int)generator.TokenCount();

// Chat loop
while (true)
{
    // Get user input
    Console.Write("User: ");
    string userPrompt = Console.ReadLine();
    if (userPrompt == "quit()") break;

    // Encode and append user prompt
    sequences = tokenizer.Encode(userPrompt);
    generator.AppendTokenSequences(sequences);

    // Generate response
    Console.Write("Assistant: ");
    while (!generator.IsDone())
    {
        generator.GenerateNextToken();
        Console.Write(tokenizerStream.Decode(generator.GetNextTokens()[0]));
    }
    Console.WriteLine();

    // Rewind to system prompt for next turn
    generator.RewindTo((ulong)systemPromptLength);
}

Python API

C++ API

C# API

C API

Constructor

Generator(Model model, GeneratorParams generatorParams)

Generation Control Methods

GenerateNextToken()

IsDone()

RewindTo(ulong newLength)

Input Methods

AppendTokens(ReadOnlySpan<int> inputIDs)

AppendTokenSequences(Sequences sequences)

SetModelInput(string name, Tensor value)

SetInputs(NamedTensors namedTensors)

Output Methods

GetNextTokens()

GetSequence(ulong index)

TokenCount()

Tensor Access Methods

GetInput(string inputName)

GetOutput(string outputName)

Adapter Methods

SetActiveAdapter(Adapters adapters, string adapterName)

Complete Generation Example

Streaming Generation Example

Chat with Rewind Example

See Also

Build docs developers (and LLMs) love

Python API

C++ API

C# API

C API

​Constructor

​Generator(Model model, GeneratorParams generatorParams)

​Generation Control Methods

​GenerateNextToken()

​IsDone()

​RewindTo(ulong newLength)

​Input Methods

​AppendTokens(ReadOnlySpan<int> inputIDs)

​AppendTokenSequences(Sequences sequences)

​SetModelInput(string name, Tensor value)

​SetInputs(NamedTensors namedTensors)

​Output Methods

​GetNextTokens()

​GetSequence(ulong index)

​TokenCount()

​Tensor Access Methods

​GetInput(string inputName)

​GetOutput(string outputName)

​Adapter Methods

​SetActiveAdapter(Adapters adapters, string adapterName)

​Complete Generation Example

​Streaming Generation Example

​Chat with Rewind Example

​See Also

Build docs developers (and LLMs) love

Constructor

Generator(Model model, GeneratorParams generatorParams)

Generation Control Methods

GenerateNextToken()

IsDone()

RewindTo(ulong newLength)

Input Methods

AppendTokens(ReadOnlySpan<int> inputIDs)

AppendTokenSequences(Sequences sequences)

SetModelInput(string name, Tensor value)

SetInputs(NamedTensors namedTensors)

Output Methods

GetNextTokens()

GetSequence(ulong index)

TokenCount()

Tensor Access Methods

GetInput(string inputName)

GetOutput(string outputName)

Adapter Methods

SetActiveAdapter(Adapters adapters, string adapterName)

Complete Generation Example

Streaming Generation Example

Chat with Rewind Example

See Also