Skip to main content

Overview

Qwen-Agent supports multiple LLM providers and model types through a unified configuration interface. The framework automatically selects the appropriate model client based on your configuration.

Basic Configuration

from qwen_agent.llm import get_chat_model

# Simple configuration
llm = get_chat_model({
    'model': 'qwen-plus',
    'model_server': 'dashscope',
    'api_key': 'your-api-key'
})

# Or use shorthand
llm = get_chat_model('qwen-plus')

Configuration Parameters

Core Parameters

model
str
required
Model identifier. Examples: 'qwen-plus', 'qwen-max', 'gpt-4'
model_server
str
Model service endpoint:
  • 'dashscope' - Use Alibaba Cloud DashScope
  • 'http://127.0.0.1:7905/v1' - Custom OpenAI-compatible endpoint
  • 'https://api.openai.com/v1' - OpenAI API
api_key
str
API key for authentication. Can also be set via environment variables:
  • DASHSCOPE_API_KEY for DashScope
  • OPENAI_API_KEY for OpenAI
model_type
str
Explicitly specify the model type. Auto-detected if not provided.Available types:
  • 'qwen_dashscope' - Qwen models via DashScope
  • 'qwenvl_dashscope' - Qwen-VL vision models
  • 'qwenaudio_dashscope' - Qwen-Audio models
  • 'oai' - OpenAI-compatible API
  • 'qwenvl_oai' - Vision models via OpenAI API
  • 'azure' - Azure OpenAI
  • 'transformers' - Local Hugging Face models
  • 'openvino' - OpenVINO optimized models
generate_cfg
dict
Generation hyperparameters (see below)

Generation Configuration

The generate_cfg dictionary controls how the LLM generates responses.

Common Parameters

llm = get_chat_model({
    'model': 'qwen-plus',
    'generate_cfg': {
        'top_p': 0.8,
        'temperature': 0.7,
        'max_tokens': 2000,
        'max_input_tokens': 6500,
        'max_retries': 10,
        'seed': 42,
        'stop': ['\n\nObservation:', 'END'],
    }
})
top_p
float
default:"0.8"
Nucleus sampling parameter. Controls diversity by sampling from top probability mass. Range: 0.0 to 1.0
temperature
float
default:"1.0"
Sampling temperature. Higher values increase randomness. Range: 0.0 to 2.0
max_tokens
int
Maximum number of tokens to generate in the response
max_input_tokens
int
default:"6500"
Maximum input context length. Messages are automatically truncated if exceeded. Set to -1 to disable truncation.
max_retries
int
default:"0"
Number of retry attempts on service errors. Implements exponential backoff.
seed
int
Random seed for reproducible generation. Auto-generated if not provided.
stop
List[str]
Stop sequences that halt generation when encountered
cache_dir
str
Directory for caching LLM responses. Requires diskcache package.

Function Calling Parameters

parallel_function_calls
bool
default:"false"
Enable parallel execution of multiple function calls in a single response
function_choice
str
default:"'auto'"
Control function calling behavior:
  • 'auto' - Model decides whether to call functions
  • 'none' - Disable function calling
  • 'function_name' - Force call a specific function
thought_in_content
bool
default:"false"
Include reasoning thoughts in the content field along with function calls
fncall_prompt_type
str
default:"'nous'"
Function calling prompt style:
  • 'nous' - Nous Research format
  • 'qwen' - Qwen-specific format

Model Types

DashScope Models (Alibaba Cloud)

# Text generation
llm = get_chat_model({
    'model': 'qwen-max',
    'model_server': 'dashscope'
})

# Available models:
# - qwen-max: Most capable
# - qwen-plus: Balanced performance
# - qwen-turbo: Fast and efficient
Source Reference: qwen_agent/llm/__init__.py:31-100

OpenAI-Compatible Models

llm = get_chat_model({
    'model': 'gpt-4',
    'model_server': 'https://api.openai.com/v1',
    'api_key': 'sk-...'
})

Local Models

llm = get_chat_model({
    'model': 'Qwen/Qwen2.5-7B-Instruct',
    'model_type': 'transformers',
    'generate_cfg': {
        'device_map': 'auto'
    }
})

Using LLM Directly

Chat Interface

from qwen_agent.llm.schema import Message

llm = get_chat_model('qwen-plus')

# Simple query
responses = llm.chat(
    messages=[Message(role='user', content='Hello!')],
    stream=False
)
print(responses[0].content)

Function Calling

functions = [{
    'name': 'get_weather',
    'description': 'Get current weather',
    'parameters': {
        'type': 'object',
        'properties': {
            'location': {
                'type': 'string',
                'description': 'City name'
            }
        },
        'required': ['location']
    }
}]

responses = llm.chat(
    messages=[Message(role='user', content='What is the weather in Beijing?')],
    functions=functions,
    stream=False
)

# Check for function call
if responses[0].function_call:
    print(f"Function: {responses[0].function_call.name}")
    print(f"Arguments: {responses[0].function_call.arguments}")
Source Reference: qwen_agent/llm/base.py:118-290

Advanced Configuration

Response Caching

import os

llm = get_chat_model({
    'model': 'qwen-plus',
    'generate_cfg': {
        'cache_dir': './llm_cache'  # Requires: pip install diskcache
    }
})

# Identical requests will be served from cache
response1 = llm.chat(messages=[Message(role='user', content='Hello')], stream=False)
response2 = llm.chat(messages=[Message(role='user', content='Hello')], stream=False)
# response2 is instant - served from cache

Raw API Mode

Bypass Qwen-Agent preprocessing for direct model access:
import os
os.environ['QWEN_AGENT_USE_RAW_API'] = 'true'

llm = get_chat_model('qwen-plus')

# Or configure per-model
llm = get_chat_model({
    'model': 'qwen-plus',
    'generate_cfg': {
        'use_raw_api': True
    }
})
Note: Raw API mode only supports full streaming (stream=True, delta_stream=False) Source Reference: qwen_agent/llm/base.py:89-223

Error Handling

from qwen_agent.llm import ModelServiceError

llm = get_chat_model({
    'model': 'qwen-plus',
    'generate_cfg': {
        'max_retries': 10  # Retry up to 10 times with exponential backoff
    }
})

try:
    responses = llm.chat(messages=[...], stream=False)
except ModelServiceError as e:
    print(f"Error code: {e.code}")
    print(f"Error message: {e.message}")

Message Schema

Message Format

from qwen_agent.llm.schema import Message, ContentItem, FunctionCall

# Text message
msg = Message(role='user', content='Hello')

# Multimodal message
msg = Message(
    role='user',
    content=[
        ContentItem(text='What is in this image?'),
        ContentItem(image='https://example.com/image.jpg')
    ]
)

# Assistant message with function call
msg = Message(
    role='assistant',
    content='',
    function_call=FunctionCall(
        name='get_weather',
        arguments='{"location": "Beijing"}'
    )
)

# Function result message
msg = Message(
    role='function',
    name='get_weather',
    content='Temperature: 20°C, Sunny'
)
Source Reference: qwen_agent/llm/schema.py:132-164

Content Types

Messages can contain multiple content types:
text
str
Plain text content
image
str
Image URL or base64-encoded image
file
str
File URL or path
audio
Union[str, dict]
Audio URL or audio configuration
video
Union[str, list]
Video URL or frame list
Source Reference: qwen_agent/llm/schema.py:80-129

Best Practices

Model Selection

  • Use qwen-max for complex reasoning tasks
  • Use qwen-plus for balanced performance
  • Use qwen-turbo for speed-critical applications
  • Use vision models only when processing images

Context Management

  • Set max_input_tokens to prevent context overflow
  • The framework auto-truncates old messages when needed
  • Keep system messages concise
  • Consider RAG for large document contexts

Performance

  • Enable caching for repeated queries
  • Use streaming for better UX
  • Configure max_retries for production reliability
  • Use qwen-turbo for latency-sensitive apps

Function Calling

  • Use parallel_function_calls for independent operations
  • Set function_choice='none' to disable functions temporarily
  • Always validate function arguments
  • Handle tool errors gracefully

Environment Variables

DASHSCOPE_API_KEY
string
API key for DashScope services
OPENAI_API_KEY
string
API key for OpenAI services
QWEN_AGENT_USE_RAW_API
boolean
default:"false"
Enable raw API mode globally. Set to 'true' to enable.

Agents

Learn how to use LLMs within agents

Function Calling

Deep dive into function calling

Build docs developers (and LLMs) love