Supported Models
| Model | Reasoning Tags | Parser | Notes |
|---|---|---|---|
| DeepSeek-R1 | <think>...</think> | deepseek-r1 | All variants (R1, R1-0528, R1-Distill) |
| DeepSeek-V3 | <think>...</think> | deepseek-v3 | Including V3.2. Supports thinking parameter |
| Qwen3 | <think>...</think> | qwen3 | Supports enable_thinking parameter |
| Qwen3-Thinking | <think>...</think> | qwen3 or qwen3-thinking | Always generates thinking |
| Kimi K2 | ◁think▷...◁/think▷ | kimi_k2 | Also requires --tool-call-parser kimi_k2 for tool use |
| GPT OSS | <|channel|>analysis<|message|>...<|end|> | gpt-oss | Special analysis channel format |
Model-Specific Behaviors
DeepSeek-R1 Family
DeepSeek-R1 Family
- DeepSeek-R1: No
<think>start tag, jumps directly to thinking content - DeepSeek-R1-0528: Generates both
<think>start and</think>end tags - Both handled by the same
deepseek-r1parser
DeepSeek-V3 Family
DeepSeek-V3 Family
- DeepSeek-V3.1/V3.2: Hybrid model supporting both thinking and non-thinking modes
- Use the
deepseek-v3parser andthinkingparameter (NOTenable_thinking)
Qwen3 Family
Qwen3 Family
- Standard Qwen3 (e.g., Qwen3-2507): Use
qwen3parser, supportsenable_thinkingin chat templates - Qwen3-Thinking (e.g., Qwen3-235B-A22B-Thinking-2507): Use
qwen3orqwen3-thinking, always thinks
Kimi K2
Kimi K2
Uses special
◁think▷ and ◁/think▷ tags. For agentic tool use, also specify --tool-call-parser kimi_k2.GPT OSS
GPT OSS
Uses special
<|channel|>analysis<|message|> and <|end|> tags for analysis content.Quick Start
Launch Server
The
--reasoning-parser argument specifies which parser to use for interpreting reasoning content in the model’s output.OpenAI-Compatible API
The API follows the DeepSeek API design with:reasoning_content: The chain-of-thought reasoningcontent: The final answer
Non-Streaming Request
Streaming Request
Buffered Streaming
Buffer reasoning content until complete, then stream it in one chunk:Disable Reasoning Separation
To get the raw output with reasoning tags:Native API Usage
You can also use the native SGLang API:Generate with Native API
Parse Reasoning
Parser Details
DeepSeek-R1 Parser
Handles both tag variants:- Models that omit
<think>start tag - Models that include both
<think>and</think>tags
DeepSeek-V3 Parser
Supports hybrid thinking mode controlled by thethinking parameter:
Qwen3 Parser
Standard Qwen3 models supportenable_thinking in the chat template:
Kimi K2 Parser
Uses Unicode triangle characters for thinking delimiters:Implementation Details
Reasoning parsing is implemented through specialized parser classes that:- Detect reasoning boundaries - Identify start and end tags in the output stream
- Extract reasoning content - Separate thinking from final answer
- Handle streaming - Support both buffered and unbuffered streaming modes
- Format responses - Map to OpenAI-compatible response format
python/sglang/srt/function_call/function_call_parser.py:48
Configuration Options
| Parameter | Description | Default |
|---|---|---|
--reasoning-parser | Parser to use for reasoning content | None |
separate_reasoning | Enable reasoning separation in requests | True (when parser set) |
stream_reasoning | Stream reasoning incrementally vs buffered | True |
Performance Considerations
Streaming Modes
Streaming Modes
- Unbuffered (
stream_reasoning=True): Lower latency, reasoning appears token-by-token - Buffered (
stream_reasoning=False): Better UX for long reasoning, appears all at once
Parser Overhead
Parser Overhead
Parsing adds minimal overhead (<1ms per request). The parser operates on the output stream and does not affect generation speed.
Use Cases
Debugging
Display reasoning to understand model’s decision process
Educational Tools
Show step-by-step problem solving
Transparency
Provide visibility into AI reasoning for high-stakes decisions
Analysis
Log and analyze reasoning patterns
