Skip to main content
SGLang provides comprehensive support for function calling (tool calling), enabling models to interact with external tools and APIs. This follows the OpenAI function calling specification.

Supported Parsers

ParserSupported ModelsNotes
deepseekv3DeepSeek-V3Recommend --chat-template ./examples/chat_template/tool_chat_template_deepseekv3.jinja
deepseekv31DeepSeek-V3.1, DeepSeek-V3.2-ExpRecommend custom chat template
deepseekv32DeepSeek-V3.2Official V3.2 release
glmGLM series (e.g., zai-org/GLM-4.6)ChatGLM models
gpt-ossGPT-OSS (120B, 20B variants)Filters analysis channel events
kimi_k2moonshotai/Kimi-K2-InstructMoonshot AI model
llama3Llama 3.1/3.2/3.3Meta’s Llama family
llama4Llama 4Latest Llama models
mistralMistral modelsMistral AI models
pythonicLlama-3.2/3.3/4Outputs function calls as Python code
qwenQwen series (except Qwen3-Coder)Alibaba Qwen models
qwen3_coderQwen3-CoderSpecialized coder variant
step3Step-3Step models

Quick Start

Launch Server

python -m sglang.launch_server \
    --model-path Qwen/Qwen2.5-7B-Instruct \
    --tool-call-parser qwen25
The --tool-call-parser argument specifies which parser to use for interpreting function calls in the model’s output.

Define Tools

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city to find the weather for, e.g. 'San Francisco'",
                    },
                    "state": {
                        "type": "string",
                        "description": "Two-letter state abbreviation, e.g. 'CA' for California",
                    },
                    "unit": {
                        "type": "string",
                        "description": "Temperature unit",
                        "enum": ["celsius", "fahrenheit"],
                    },
                },
                "required": ["city", "state", "unit"],
            },
        },
    }
]

Make Requests

Non-Streaming

import openai

client = openai.Client(base_url="http://localhost:30000/v1", api_key="None")

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "What's the weather like in Boston today?",
        }
    ],
    temperature=0,
    tools=tools,
)

print("Content:", response.choices[0].message.content)
print("Tool calls:", response.choices[0].message.tool_calls)

# Access tool call details
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print("Function:", tool_call.function.name)
    print("Arguments:", tool_call.function.arguments)

Streaming

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "What's the weather like in Boston today?",
        }
    ],
    temperature=0,
    stream=True,
    tools=tools,
)

text = ""
tool_calls = []

for chunk in response:
    if chunk.choices[0].delta.content:
        text += chunk.choices[0].delta.content
    if chunk.choices[0].delta.tool_calls:
        tool_calls.append(chunk.choices[0].delta.tool_calls[0])

print("Text:", text)
print("Tool calls:", tool_calls)

# Reconstruct function call
function_name = next(
    (tc.function.name for tc in tool_calls if tc.function.name),
    None
)
arguments = "".join(
    tc.function.arguments for tc in tool_calls if tc.function.arguments
)

print(f"Function: {function_name}")
print(f"Arguments: {arguments}")

Multi-Tool Support

Define multiple tools for the model to choose from:
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["city"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "description": "Get current stock price",
            "parameters": {
                "type": "object",
                "properties": {
                    "symbol": {"type": "string", "description": "Stock ticker symbol"},
                },
                "required": ["symbol"],
            },
        },
    },
]

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "What's the weather in SF and the AAPL stock price?",
        }
    ],
    tools=tools,
)

Tool Choice Control

Control when and how the model calls tools:
# Auto: Model decides whether to call tools
response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=[...],
    tools=tools,
    tool_choice="auto",  # Default
)

# None: Model cannot call tools
response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=[...],
    tools=tools,
    tool_choice="none",
)

# Required: Model must call a tool
response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=[...],
    tools=tools,
    tool_choice="required",
)

# Specific: Force a specific tool
response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=[...],
    tools=tools,
    tool_choice={
        "type": "function",
        "function": {"name": "get_current_weather"},
    },
)

Multi-Turn Conversations

Implement agentic workflows with tool execution:
def get_current_weather(city: str, state: str, unit: str) -> str:
    # Implement actual weather API call
    return f"The weather in {city}, {state} is 72°{unit[0].upper()}"

messages = [
    {"role": "user", "content": "What's the weather in Boston, MA?"}
]

# First turn: Model calls tool
response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=messages,
    tools=tools,
)

assistant_message = response.choices[0].message
messages.append(assistant_message.model_dump())

# Execute tool
if assistant_message.tool_calls:
    for tool_call in assistant_message.tool_calls:
        function_name = tool_call.function.name
        arguments = eval(tool_call.function.arguments)  # Parse JSON
        
        # Call the actual function
        result = get_current_weather(**arguments)
        
        # Add tool result to messages
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": result,
        })

# Second turn: Model generates final response
response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=messages,
    tools=tools,
)

print(response.choices[0].message.content)
# "The weather in Boston, MA is currently 72°F."

Parallel Tool Calls

Some models support calling multiple tools in one turn:
response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "Get weather in Boston and stock price for AAPL",
        }
    ],
    tools=tools,
    parallel_tool_calls=True,  # Enable parallel calls
)

# Process multiple tool calls
for tool_call in response.choices[0].message.tool_calls:
    print(f"Tool: {tool_call.function.name}")
    print(f"Args: {tool_call.function.arguments}")

Pythonic Tool Calling

Some models can output function calls as executable Python code:
python -m sglang.launch_server \
    --model-path meta-llama/Llama-3.2-3B-Instruct \
    --tool-call-parser pythonic
The model generates:
get_current_weather(city="Boston", state="MA", unit="fahrenheit")
Instead of JSON format.

Model-Specific Notes

DeepSeek-V3 family supports thinking before tool calls:
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3.1",
    messages=[...],
    tools=tools,
    extra_body={"thinking": True},  # Enable reasoning
)
GPT-OSS uses analysis channels. The parser filters these out, but content may be empty if all output is in analysis channel. Complete the tool round by returning tool results to get final content.
For Kimi K2 with thinking, use both parsers:
python -m sglang.launch_server \
    --model-path moonshotai/Kimi-K2-Thinking \
    --tool-call-parser kimi_k2 \
    --reasoning-parser kimi_k2

Implementation Details

Tool calling is implemented through the FunctionCallParser system:
# From python/sglang/srt/function_call/function_call_parser.py:39
class FunctionCallParser:
    ToolCallParserEnum = {
        "deepseekv3": DeepSeekV3Detector,
        "llama3": Llama32Detector,
        "qwen": Qwen25Detector,
        # ...
    }
    
    def parse_stream_chunk(self, chunk_text):
        """Parse streaming chunks for tool calls"""
        return self.detector.parse_streaming_increment(chunk_text, self.tools)
    
    def parse_non_stream(self, full_text):
        """Parse complete text for tool calls"""
        return self.detector.detect_and_parse(full_text, self.tools)
Source: python/sglang/srt/function_call/function_call_parser.py:39 Each detector implements:
  • Pattern detection: Identify tool call syntax in output
  • Argument extraction: Parse JSON/Python arguments
  • Streaming support: Handle incremental parsing
  • Validation: Ensure arguments match schema

Combining with Structured Outputs

You can combine tool calling with structured outputs for precise control:
response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=[...],
    tools=tools,
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "weather_response",
            "schema": {
                "type": "object",
                "properties": {
                    "tool_call": {"type": "string"},
                    "reasoning": {"type": "string"},
                },
            },
        },
    },
)

Best Practices

Write descriptive tool names and clear parameter descriptions. This helps the model understand when and how to use each tool.
Mark essential parameters as required in the schema. This ensures the model provides all necessary information.
Always validate tool call arguments before execution. Handle parsing errors and missing parameters appropriately.
Set reasonable timeouts for tool execution to prevent hanging on slow APIs.
Define enum fields for parameters with fixed options (e.g., units, categories).

Performance Considerations

  • Parser overhead: Minimal (<1ms per request)
  • Streaming latency: Tool calls appear incrementally in stream
  • Multi-tool calls: Some parsers support multiple calls per turn
  • Validation: Schema validation adds negligible overhead

Limitations

  • Parser support varies by model architecture
  • Some models may hallucinate tool calls not in the provided list
  • Complex nested schemas may confuse some models
  • Streaming with parallel tool calls may have delayed final chunks