Skip to main content

Overview

The sglang generate command runs inference on multimodal diffusion models. This command is currently supported only for diffusion models and provides a convenient way to generate images, videos, or other outputs without starting a server.

Basic Usage

sglang generate --model-path <model-name-or-path> --prompt "your prompt" [options]
Alternatively, you can use a configuration file:
sglang generate --config config.json

Required Arguments

--model-path
string
required
Path or name of the diffusion model to use. Can be:
  • HuggingFace model ID (e.g., stabilityai/stable-diffusion-xl-base-1.0)
  • Local path to model directory
  • ModelScope model ID (when using SGLANG_USE_MODELSCOPE=1)
--prompt
string
required
Text prompt describing what to generate.

Model Configuration

--config
string
Path to a JSON or YAML configuration file containing model and generation parameters. When provided, --model-path and --prompt become optional.
--model-id
string
Explicit model ID override (e.g., “Qwen-Image”).
--backend
string
default:"auto"
Model backend to use. Options:
  • auto: Automatically select backend (prefer sglang native, fallback to diffusers)
  • sglang: Use sglang’s native optimized implementation
  • diffusers: Use vanilla diffusers pipeline (supports all diffusers models)
--trust-remote-code
boolean
default:"false"
Trust remote code from HuggingFace.
--revision
string
Model revision (branch/tag name or commit ID).

Sampling Parameters

Generation Settings

--negative-prompt
string
Negative prompt to guide what not to generate.
--num-inference-steps
integer
default:"50"
Number of denoising steps. More steps generally produce higher quality but take longer.
--guidance-scale
float
default:"7.5"
Guidance scale for classifier-free guidance. Higher values follow the prompt more closely.
--height
integer
Output height in pixels.
--width
integer
Output width in pixels.
--seed
integer
Random seed for reproducibility.

Batch Generation

--num-samples
integer
default:"1"
Number of samples to generate.

Parallelism Options

--num-gpus
integer
default:"1"
Number of GPUs to use for inference.
--tp-size
integer
Tensor parallelism size.
--sp-degree
integer
Sequence parallelism degree.
--ulysses-degree
integer
Ulysses sequence parallelism degree for long sequences.
--ring-degree
integer
Ring sequence parallelism degree.
--dp-size
integer
default:"1"
Data parallelism size (number of data parallel groups).
--dp-degree
integer
default:"1"
Number of GPUs in a data parallel group.
--enable-cfg-parallel
boolean
default:"false"
Enable classifier-free guidance parallelism.

Attention Backend

--attention-backend
string
Attention backend to use for the model.
--attention-backend-config
string
Additional configuration for the attention backend (JSON format).
--cache-dit-config
string
Cache-DIT configuration for diffusers backend.

CPU Offloading

--dit-cpu-offload
boolean
Offload DiT (Diffusion Transformer) model to CPU to save GPU memory.
--dit-layerwise-offload
boolean
Enable layer-wise offloading for DiT model.
--text-encoder-cpu-offload
boolean
Offload text encoder to CPU.
--image-encoder-cpu-offload
boolean
Offload image encoder to CPU.
--vae-cpu-offload
boolean
Offload VAE (Variational AutoEncoder) to CPU.

LoRA Adapters

--lora-path
string
Path to LoRA adapter weights.
--lora-nickname
string
default:"default"
Nickname for the LoRA adapter (for swapping adapters in the pipeline).
--lora-scale
float
default:"1.0"
LoRA scale for merging (e.g., 0.125 for Hyper-SD).
--lora-target-modules
string
List of module names to apply LoRA to (e.g., “q_proj,k_proj”).

Quantization

--transformer-weights-path
string
Path to pre-quantized transformer weights (single .safetensors file or directory).
--nunchaku-config
string
Nunchaku SVDQuant configuration for model quantization.

Performance Options

--enable-torch-compile
boolean
default:"false"
Enable PyTorch compilation for faster inference.
--warmup
boolean
default:"false"
Run warmup iterations before generation.
--warmup-steps
integer
default:"1"
Number of warmup steps to run.
--disable-autocast
boolean
Disable automatic mixed precision.

Output Options

--output-path
string
default:"outputs/"
Directory path to save generated outputs.
--perf-dump-path
string
Path to dump performance metrics (JSON) for the run.

Advanced Options

--diffusers-kwargs
string
Additional keyword arguments to pass to the diffusers pipeline (JSON format).Example: --diffusers-kwargs '{"eta": 0.5, "use_karras_sigmas": true}'
--component-paths
string
Override paths for specific pipeline components (JSON format).Example: --component-paths '{"vae": "path/to/custom/vae"}'
--pipeline-class-name
string
Override the pipeline class from model_index.json.

Examples

Basic Image Generation

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "A serene landscape with mountains and a lake at sunset"

High-Quality Generation with Custom Settings

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "A futuristic city with flying cars" \
  --negative-prompt "blurry, low quality, distorted" \
  --num-inference-steps 100 \
  --guidance-scale 9.0 \
  --height 1024 \
  --width 1024 \
  --seed 42

Multi-GPU Inference

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "A beautiful forest scene" \
  --num-gpus 4 \
  --sp-degree 2 \
  --tp-size 2

Batch Generation

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "Abstract art with vibrant colors" \
  --num-samples 4 \
  --seed 42

Using LoRA Adapters

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "Anime style character portrait" \
  --lora-path path/to/anime-lora \
  --lora-scale 0.8

CPU Offloading for Large Models

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "A detailed photograph of nature" \
  --dit-cpu-offload \
  --vae-cpu-offload \
  --text-encoder-cpu-offload

Using Configuration File

Create a config file generation_config.json:
{
  "model_path": "stabilityai/stable-diffusion-xl-base-1.0",
  "prompt": "A majestic dragon flying over mountains",
  "negative_prompt": "blurry, low quality",
  "num_inference_steps": 50,
  "guidance_scale": 7.5,
  "height": 1024,
  "width": 1024,
  "seed": 12345,
  "num_samples": 2
}
Then run:
sglang generate --config generation_config.json

Performance Benchmarking

sglang generate \
  --model-path stabilityai/stable-diffusion-xl-base-1.0 \
  --prompt "Test image" \
  --perf-dump-path performance_metrics.json \
  --warmup \
  --warmup-steps 3

Output

Generated outputs are saved to the specified output directory (default: outputs/). The command will display generation progress and save:
  • Generated images/videos in the output directory
  • Performance metrics (if --perf-dump-path is specified)
Example output:
INFO: Loading model: stabilityai/stable-diffusion-xl-base-1.0
INFO: Model loaded successfully
INFO: Generating with prompt: "A serene landscape..."
INFO: Progress: 100% [50/50 steps]
INFO: Generated image saved to: outputs/generated_image_0.png
INFO: Total generation time: 3.45s

Limitations

The generate command is currently only supported for diffusion models. For language models, use the sglang serve command to start a server and make API requests.

Help

To see all available options:
sglang generate --help