Text Generation
sgl.gen() - Generate Text
The primary primitive for generating text from the model.
Basic Usage:
name(str): Variable name to store the generated text. Access viastate[name]max_tokens(int): Maximum number of tokens to generatemin_tokens(int): Minimum number of tokens to generatestop(str | List[str]): Stop sequence(s) to end generationstop_token_ids(List[int]): Token IDs that trigger stopstop_regex(str | List[str]): Regular expression patterns to stop generationtemperature(float): Sampling temperature (0.0 = greedy, higher = more random)top_p(float): Nucleus sampling thresholdtop_k(int): Top-k sampling parametermin_p(float): Minimum probability thresholdfrequency_penalty(float): Penalty for token frequencypresence_penalty(float): Penalty for token presenceignore_eos(bool): Ignore end-of-sequence tokenregex(str): Regular expression to constrain output formatjson_schema(str): JSON schema for structured outputchoices(List[str]): List of choices (equivalent toselect())return_logprob(bool): Return log probabilitieslogprob_start_len(int): Position to start returning logprobstop_logprobs_num(int): Number of top logprobs to returnreturn_text_in_logprobs(bool): Include text in logprob results
Typed Generation
SGLang provides convenience functions for generating specific data types:sgl.gen_int() - Generate Integer
sgl.gen_string() - Generate String
Constrained Generation
Regular Expression Constraints
Use regex to enforce specific output formats:JSON Schema Constraints
Generate structured JSON output:Choice Selection
sgl.select() - Choose from Options
Select the most likely option from a list of choices:
name(str): Variable name for the selected choicechoices(List[str]): List of possible choicestemperature(float): Sampling temperature (usually 0.0 for deterministic selection)choices_method(ChoicesSamplingMethod): Method for scoring choices
choices in gen():
Alternatively, use the choices parameter in gen():
Multimodal Primitives
sgl.image() - Add Image Input
Add an image to the prompt for vision models:
path(str): Path to image file or base64-encoded image data
sgl.video() - Add Video Input
Add a video to the prompt for video-capable models:
path(str): Path to video filenum_frames(int): Number of frames to sample from the video
Role Management Primitives
For chat models, structure conversations with role primitives:sgl.system() - System Message
sgl.user() - User Message
sgl.assistant() - Assistant Message
Role Context Managers
For complex role structures, use context managers:sgl.system()/s.system()sgl.user()/s.user()sgl.assistant()/s.assistant()sgl.system_begin()/sgl.system_end()sgl.user_begin()/sgl.user_end()sgl.assistant_begin()/sgl.assistant_end()
Advanced Primitives
Separate Reasoning
For models that support chain-of-thought reasoning with special tokens:Complete Examples
Question Answering with Constraints
Multimodal Analysis
Best Practices
- Name your variables: Always provide a
nameparameter to access generated content - Use stop sequences: Prevent over-generation with appropriate stop tokens
- Set max_tokens: Always set reasonable limits to avoid runaway generation
- Use constraints wisely: Regex and JSON schemas ensure format compliance
- Choose appropriate temperature: 0.0 for factual, higher for creative tasks
- Test constraints: Verify regex patterns work as expected before production use
