@sgl.function decorator is the foundation of SGLang’s frontend language. It transforms a regular Python function into an SGLang program that can be executed with various backends and execution modes.
Basic Usage
Defining a Function
Use the@sgl.function decorator to create an SGLang function:
s is the state object that manages the conversation context. All other parameters become inputs to your function.
Running Functions
Once defined, SGLang functions gain special methods for execution:The State Object
The state object (s) is the core of every SGLang function. It provides methods and operators to build prompts and control execution flow.
Appending Content
Use the+= operator to append text to the state:
Accessing Variables
Use dictionary-style access to retrieve generated content:Role Management
For chat models, use role methods to structure conversations:Execution Methods
.run() - Single Execution
Execute a single request:
- Function arguments (positional and keyword)
- Sampling parameters (temperature, max_tokens, top_p, etc.)
stream(bool): Enable streaming outputbackend(BaseBackend): Override the default backend
ProgramState: A state object containing results
.run_batch() - Batch Execution
Process multiple inputs efficiently:
batch_arguments(List[Dict]): List of argument dictionaries- Sampling parameters (applied to all requests)
num_threads(int | “auto”): Number of parallel threadsprogress_bar(bool): Show progress barbackend(BaseBackend): Override the default backend
List[ProgramState]: List of state objects
Generator-Style Batch Processing
For large batches, use generator mode to process results as they complete:Advanced Features
Parallel Sampling with Fork/Join
Generate multiple responses in parallel and gather results:s.fork(n): Create n parallel branchesforks[i]: Access individual forkforks += expr: Apply expression to all forksforks.join(): Merge results back
Copy Context
Create a temporary copy of the state:Variable Scopes
Capture specific sections of generated text:API Speculative Execution
For chat-based API backends (OpenAI, Anthropic), SGLang can speculatively execute multiple generation calls in a single API request:max_tokens=200 instead of two separate requests.
Syntax:
