The Generator class manages the text generation process. It handles the iterative generation of tokens and provides access to generated sequences.
Constructor
Generator(Model model, GeneratorParams generatorParams)
Creates a new generator instance.
The model to use for generation
Parameters controlling the generation process
using Microsoft.ML.OnnxRuntimeGenAI;
using Model model = new Model("/path/to/model");
using GeneratorParams generatorParams = new GeneratorParams(model);
using Generator generator = new Generator(model, generatorParams);
Generation Control Methods
GenerateNextToken()
Generates the next token in the sequence. This is the core method for the generation loop.
while (!generator.IsDone())
{
generator.GenerateNextToken();
}
IsDone()
Checks if generation is complete (end-of-sequence token generated or max length reached).
Returns: bool - true if generation is complete, false otherwise
if (generator.IsDone())
{
Console.WriteLine("Generation complete");
}
RewindTo(ulong newLength)
Rewinds the generator to a specific token length. Useful for chat scenarios where you want to keep the system prompt but remove the conversation history.
The token length to rewind to
// Save system prompt length
ulong systemPromptLength = generator.TokenCount();
// ... generate response ...
// Rewind to system prompt for next turn
generator.RewindTo(systemPromptLength);
Appends token IDs to the generator’s input.
inputIDs
ReadOnlySpan<int>
required
The token IDs to append
ReadOnlySpan<int> tokens = new int[] { 1, 2, 3, 4 };
generator.AppendTokens(tokens);
AppendTokenSequences(Sequences sequences)
Appends encoded sequences to the generator’s input.
The sequences to append (typically from Tokenizer.Encode())
using Sequences sequences = tokenizer.Encode("Hello, world!");
generator.AppendTokenSequences(sequences);
Sets a specific model input tensor.
Tensor inputTensor = /* create tensor */;
generator.SetModelInput("custom_input", inputTensor);
Sets multiple model inputs at once.
Collection of named tensors to set as inputs
using NamedTensors inputs = processor.ProcessImagesAndAudios(prompt, images, audios);
generator.SetInputs(inputs);
Output Methods
GetNextTokens()
Returns the tokens generated in the last GenerateNextToken() call.
Returns: ReadOnlySpan<int> - The most recently generated tokens
generator.GenerateNextToken();
ReadOnlySpan<int> newTokens = generator.GetNextTokens();
GetSequence(ulong index)
Returns the complete token sequence for a specific sequence index.
The sequence index (0 for single sequence generation)
Returns: ReadOnlySpan<int> - The complete token sequence
// Get the complete output sequence
var outputSequence = generator.GetSequence(0);
string outputText = tokenizer.Decode(outputSequence);
TokenCount()
Returns the total number of tokens in the generator (including input and generated tokens).
Returns: ulong - The total token count
ulong totalTokens = generator.TokenCount();
Console.WriteLine($"Total tokens: {totalTokens}");
Tensor Access Methods
Retrieves an input tensor by name.
The name of the input tensor
Returns: Tensor - The input tensor
using Tensor inputTensor = generator.GetInput("input_ids");
GetOutput(string outputName)
Retrieves an output tensor by name.
The name of the output tensor
Returns: Tensor - The output tensor
using Tensor outputTensor = generator.GetOutput("logits");
Adapter Methods
SetActiveAdapter(Adapters adapters, string adapterName)
Activates a previously loaded adapter (for LoRA/fine-tuned models).
The name of the adapter to activate
using Adapters adapters = /* load adapters */;
generator.SetActiveAdapter(adapters, "my_adapter");
Complete Generation Example
Here’s a complete example of the generation loop:
examples/csharp/ModelChat/Program.cs:54-84
using Microsoft.ML.OnnxRuntimeGenAI;
using System.Diagnostics;
string modelPath = "/path/to/model";
// Load model
using Model model = new Model(modelPath);
using Tokenizer tokenizer = new Tokenizer(model);
// Prepare input
string prompt = "Once upon a time";
using var sequences = tokenizer.Encode(prompt);
// Create generator params
using GeneratorParams generatorParams = new GeneratorParams(model);
generatorParams.SetSearchOption("max_length", 200);
generatorParams.SetSearchOption("temperature", 0.7);
// Create generator and append input tokens
using Generator generator = new Generator(model, generatorParams);
generator.AppendTokenSequences(sequences);
// Run generation loop
var watch = Stopwatch.StartNew();
while (!generator.IsDone())
{
generator.GenerateNextToken();
}
watch.Stop();
// Get output and decode
var outputSequence = generator.GetSequence(0);
string outputString = tokenizer.Decode(outputSequence);
// Display results
Console.WriteLine("Output:");
Console.WriteLine(outputString);
var totalTokens = (int)generator.TokenCount();
var tokensPerSecond = totalTokens / watch.Elapsed.TotalSeconds;
Console.WriteLine($"Tokens: {totalTokens}, Time: {watch.Elapsed.TotalSeconds:0.00}s, Tokens/sec: {tokensPerSecond:0.00}");
Streaming Generation Example
examples/csharp/ModelChat/Program.cs:200-214
using Microsoft.ML.OnnxRuntimeGenAI;
using Model model = new Model("/path/to/model");
using Tokenizer tokenizer = new Tokenizer(model);
using TokenizerStream tokenizerStream = tokenizer.CreateStream();
// ... setup generator ...
Console.Write("Output: ");
var watch = Stopwatch.StartNew();
while (!generator.IsDone())
{
generator.GenerateNextToken();
// Decode and print each token immediately
Console.Write(tokenizerStream.Decode(generator.GetNextTokens()[0]));
}
watch.Stop();
Console.WriteLine();
var totalTokens = (int)generator.TokenCount();
Console.WriteLine($"Tokens: {totalTokens}, Time: {watch.Elapsed.TotalSeconds:0.00}s");
Chat with Rewind Example
examples/csharp/ModelChat/Program.cs:314-382
using Microsoft.ML.OnnxRuntimeGenAI;
using Model model = new Model("/path/to/model");
using Tokenizer tokenizer = new Tokenizer(model);
using TokenizerStream tokenizerStream = tokenizer.CreateStream();
// Create generator params
using GeneratorParams generatorParams = new GeneratorParams(model);
using Generator generator = new Generator(model, generatorParams);
// Encode system prompt
string systemPrompt = "You are a helpful AI assistant.";
var sequences = tokenizer.Encode(systemPrompt);
generator.AppendTokenSequences(sequences);
var systemPromptLength = (int)generator.TokenCount();
// Chat loop
while (true)
{
// Get user input
Console.Write("User: ");
string userPrompt = Console.ReadLine();
if (userPrompt == "quit()") break;
// Encode and append user prompt
sequences = tokenizer.Encode(userPrompt);
generator.AppendTokenSequences(sequences);
// Generate response
Console.Write("Assistant: ");
while (!generator.IsDone())
{
generator.GenerateNextToken();
Console.Write(tokenizerStream.Decode(generator.GetNextTokens()[0]));
}
Console.WriteLine();
// Rewind to system prompt for next turn
generator.RewindTo((ulong)systemPromptLength);
}
See Also