Dynamically load and switch between multiple LoRA adapters at runtime
ONNX Runtime GenAI supports Multi-LoRA, allowing you to dynamically load, manage, and switch between multiple LoRA (Low-Rank Adaptation) adapters at runtime without reloading the base model.
Here’s a complete example showing how to use multiple LoRA adapters:
import onnxruntime_genai as og# Load the base modelmodel = og.Model('path/to/base/model')tokenizer = og.Tokenizer(model)# Create the Adapters manageradapters = og.Adapters.Create(model)# Load multiple LoRA adaptersadapters.LoadAdapter( adapter_file_path='path/to/adapter1/adapter_weights.onnx', adapter_name='summarization')adapters.LoadAdapter( adapter_file_path='path/to/adapter2/adapter_weights.onnx', adapter_name='translation')adapters.LoadAdapter( adapter_file_path='path/to/adapter3/adapter_weights.onnx', adapter_name='coding')# Set up generation parametersparams = og.GeneratorParams(model)params.set_search_options(max_length=200)# Create generator and set active adaptergenerator = og.Generator(model, params)# Use the summarization adaptergenerator.SetActiveAdapter(adapters, 'summarization')# Encode input and generateprompt = "Summarize this article: ..."input_tokens = tokenizer.encode(prompt)generator.append_tokens(input_tokens)while not generator.is_done(): generator.generate_next_token()summary = tokenizer.decode(generator.get_sequence(0))print(f"Summary: {summary}")# Switch to a different adapter for the next generationgenerator2 = og.Generator(model, params)generator2.SetActiveAdapter(adapters, 'translation')# Generate with translation adaptertranslation_prompt = "Translate to French: Hello, how are you?"input_tokens = tokenizer.encode(translation_prompt)generator2.append_tokens(input_tokens)while not generator2.is_done(): generator2.generate_next_token()translation = tokenizer.decode(generator2.get_sequence(0))print(f"Translation: {translation}")# Unload an adapter when no longer needed# Note: Will fail if the adapter is still in useadapters.UnloadAdapter('summarization')
using Microsoft.ML.OnnxRuntimeGenAI;// Load the base modelusing var model = new Model("path/to/base/model");using var tokenizer = new Tokenizer(model);// Create Adapters managerusing var adapters = new Adapters(model);// Load LoRA adaptersadapters.LoadAdapter( adapterFilePath: "path/to/adapter1/adapter_weights.onnx", adapterName: "summarization");adapters.LoadAdapter( adapterFilePath: "path/to/adapter2/adapter_weights.onnx", adapterName: "translation");// Create generator paramsusing var generatorParams = new GeneratorParams(model);generatorParams.SetSearchOption("max_length", 200);// Create generator and set adapterusing var generator = new Generator(model, generatorParams);generator.SetActiveAdapter(adapters, "summarization");// Generate textvar prompt = "Summarize this article: ...";var inputTokens = tokenizer.Encode(prompt);generator.AppendTokens(inputTokens);while (!generator.IsDone()){ generator.GenerateNextToken();}var summary = tokenizer.Decode(generator.GetSequence(0));Console.WriteLine($"Summary: {summary}");// Unload adapteradapters.UnloadAdapter("summarization");
“Adapter still in use” error when unloading:This occurs when trying to unload an adapter that has active references. Ensure all generators using this adapter have completed or been destroyed.
“Adapter not found” error:
Verify the adapter name is spelled correctly (case-sensitive)
Ensure the adapter was successfully loaded before attempting to use it
Check that the adapter hasn’t been unloaded
Memory issues with many adapters:
Limit the number of simultaneously loaded adapters
Implement an LRU cache to automatically unload least-used adapters