Skip to main content
The Image Generation tool creates images from text descriptions using large language models with image generation capabilities.

Overview

Image Generation provides:
  • Text-to-Image: Convert text descriptions to images
  • LLM Integration: Uses configured LLM for generation
  • Flexible Sizing: Configurable output dimensions
  • Prompt Preservation: Maintains detailed requirements from user input
  • Content Items: Returns images as ContentItem objects

Registration

@register_tool('image_gen', allow_overwrite=True)
class ImageGen(BaseTool):
    ...
Tool Name: image_gen

Parameters

prompt
string
required
Detailed description of the desired image content. Should include:
  • Main subject
  • Style or artistic direction
  • Colors and mood
  • Composition details
  • Any specific requirements or text to include
Important: Keep all specific requirements from the original request intact. Omission is prohibited.

Parameter Schema

{
  "type": "object",
  "properties": {
    "prompt": {
      "description": "Detailed description of the desired content of the generated image. Please keep the specific requirements such as text from the original request fully intact. Omission is prohibited.",
      "type": "string"
    }
  },
  "required": ["prompt"]
}

Configuration

llm_cfg
dict
required
Configuration for the image generation LLM. Must include model settings.Example:
{
    'model': 'qwen-vl-plus',
    'api_key': 'your-api-key',
    'model_server': 'dashscope'
}
size
string
default:"1024*1024"
Output image dimensions. Format: "width*height"Common sizes:
  • "1024*1024" - Square
  • "1024*768" - Landscape
  • "768*1024" - Portrait
  • "1920*1080" - HD Landscape

Usage

Basic Image Generation

from qwen_agent.tools import ImageGen
import json

# Initialize with LLM config
image_gen = ImageGen(cfg={
    'llm_cfg': {
        'model': 'qwen-vl-plus',
        'api_key': 'your-api-key',
        'model_server': 'dashscope'
    },
    'size': '1024*1024'
})

# Generate an image
result = image_gen.call(
    params=json.dumps({
        'prompt': 'A serene landscape with mountains and a lake at sunset, photorealistic style'
    })
)

print(result)
# Returns: List of ContentItem objects with image data

Custom Image Size

image_gen = ImageGen(cfg={
    'llm_cfg': {
        'model': 'qwen-vl-plus',
        'api_key': 'your-api-key'
    },
    'size': '1920*1080'  # HD landscape
})

result = image_gen.call(
    params=json.dumps({
        'prompt': 'A futuristic city skyline at night, neon lights, cyberpunk aesthetic'
    })
)

Using with Agents

from qwen_agent.agents import Assistant

bot = Assistant(
    llm={'model': 'qwen-max'},
    function_list=['image_gen'],
    # Image generation tool config passed separately
)

# Note: Configure the image_gen tool before adding to agent
from qwen_agent.tools import ImageGen

image_tool = ImageGen(cfg={
    'llm_cfg': {
        'model': 'qwen-vl-plus',
        'api_key': 'your-api-key'
    }
})

messages = [
    {
        'role': 'user',
        'content': 'Create an image of a cute dog playing in a park'
    }
]

for response in bot.run(messages=messages):
    print(response)

Return Format

The tool returns a list of ContentItem objects:
from qwen_agent.llm.schema import ContentItem

# Example return value
[
    ContentItem(
        image='https://example.com/generated-image.png',  # or base64 data
        # Additional metadata may be included
    )
]
Each ContentItem contains:
  • image: URL or base64-encoded image data
  • May include additional metadata depending on the LLM

Example: Custom Image Generation Tool

import json
import urllib.parse
from qwen_agent.tools.base import BaseTool, register_tool
import json5

@register_tool('my_image_gen')
class MyImageGen(BaseTool):
    description = 'AI painting service that generates images from text descriptions'
    parameters = [{
        'name': 'prompt',
        'type': 'string',
        'description': 'Detailed description of the desired image content, in English',
        'required': True,
    }]

    def call(self, params: str, **kwargs) -> str:
        prompt = json5.loads(params)['prompt']
        # URL encode for external service
        prompt = urllib.parse.quote(prompt)
        return json.dumps(
            {'image_url': f'https://image.pollinations.ai/prompt/{prompt}'},
            ensure_ascii=False,
        )

# Use the custom tool
bot = Assistant(
    llm={'model': 'qwen-max'},
    function_list=['my_image_gen']
)

Example: Image Generation Agent

from qwen_agent.agents import Assistant
from qwen_agent.tools import ImageGen
from qwen_agent.gui import WebUI

def create_image_generator():
    """Create an agent specialized in image generation."""
    
    # Configure image generation tool
    llm_cfg = {
        'model': 'qwen-vl-plus',
        'api_key': 'your-api-key',
        'model_server': 'dashscope'
    }
    
    bot = Assistant(
        llm={'model': 'qwen-max'},
        name='AI Artist',
        description='AI image generation service',
        system_message=(
            'You are an AI artist that creates images based on user descriptions. '
            'When users request images, use the image_gen tool to create them. '
            'Ask clarifying questions if the description is too vague.'
        ),
        function_list=['image_gen']
    )
    
    return bot

# Create and use the agent
bot = create_image_generator()

# Text-based interaction
messages = []
while True:
    user_input = input('Describe the image you want (or "quit"): ')
    if user_input.lower() in ['quit', 'exit']:
        break
    
    messages.append({'role': 'user', 'content': user_input})
    
    response = []
    for response in bot.run(messages=messages):
        print('Generating...', end='\r')
    
    messages.extend(response)
    print(f"Image generated: {response[-1]['content']}")

# Or launch with GUI
# WebUI(bot).run()

Prompt Engineering Tips

Include specific details:
prompt = (
    "A golden retriever puppy sitting in a flower garden, "
    "surrounded by pink roses and white daisies, "
    "soft morning sunlight, shallow depth of field, "
    "photorealistic, 8k quality"
)
Specify style:
prompt = "A medieval castle, oil painting style, dramatic lighting, Rembrandt-inspired"
Too vague ❌:
  • “a dog”
  • “something nice”
  • “a picture”
Better ✅:
  • “A corgi puppy wearing a red bandana, sitting on green grass”
  • “A peaceful zen garden with raked sand and stone arrangements”
  • “A portrait of an elderly man with kind eyes, soft studio lighting”
  • Angle: “aerial view”, “close-up”, “wide angle”
  • Lighting: “golden hour”, “dramatic lighting”, “soft diffused light”
  • Style: “photorealistic”, “watercolor”, “digital art”, “sketch”
  • Mood: “serene”, “energetic”, “mysterious”, “cheerful”
  • Quality: “4k”, “8k”, “high detail”, “cinematic”
Some models support prompts in multiple languages:
# English
prompt = "A cute cat wearing a wizard hat"

# Chinese
prompt = "一只可爱的猫咪戴着巫师帽"

# Works with both

Supported Models

Depends on your LLM configuration. Common options:

Qwen Models (via DashScope)

llm_cfg = {
    'model': 'qwen-vl-plus',
    'api_key': 'your-dashscope-key',
    'model_server': 'dashscope'
}

OpenAI DALL-E

llm_cfg = {
    'model': 'dall-e-3',
    'api_key': 'your-openai-key',
    'model_server': 'https://api.openai.com/v1'
}

Other Compatible Models

Any model that:
  • Accepts text prompts via chat interface
  • Returns images as ContentItem objects
  • Is supported by Qwen-Agent’s LLM interface

Advanced Usage

Batch Generation

image_gen = ImageGen(cfg={
    'llm_cfg': {'model': 'qwen-vl-plus', 'api_key': 'key'},
    'size': '1024*1024'
})

prompts = [
    'A red apple on a wooden table',
    'A blue ocean wave crashing',
    'A green forest in spring'
]

generated_images = []
for prompt in prompts:
    result = image_gen.call(params=json.dumps({'prompt': prompt}))
    generated_images.append(result)
    print(f"Generated: {prompt}")

With Code Interpreter

from qwen_agent.agents import Assistant

bot = Assistant(
    llm={'model': 'qwen-max'},
    function_list=['image_gen', 'code_interpreter'],
    system_message=(
        'You can generate images and then process them with Python code. '
        'First generate the image, then use code_interpreter to analyze or modify it.'
    )
)

messages = [{
    'role': 'user',
    'content': 'Generate an image of a data chart and then analyze its colors'
}]

for response in bot.run(messages=messages):
    print(response)

Error Handling

from qwen_agent.tools import ImageGen
import json

image_gen = ImageGen(cfg={
    'llm_cfg': {'model': 'qwen-vl-plus', 'api_key': 'key'}
})

try:
    result = image_gen.call(
        params=json.dumps({'prompt': 'A beautiful landscape'})
    )
    print("Image generated successfully")
except ValueError as e:
    print(f"Configuration error: {e}")
except Exception as e:
    print(f"Generation failed: {e}")

Best Practices

  • Be specific and detailed
  • Include style/mood descriptors
  • Specify quality/resolution keywords
  • Test different phrasings for best results
  • Image generation can be slow (10-60 seconds)
  • Consider caching generated images
  • Implement timeouts for agent workflows
  • Monitor API usage and costs
  • Most models have built-in content filters
  • Respect usage policies
  • Don’t attempt to bypass safety features
  • Review generated content before sharing

Limitations

  • Requires LLM with image generation capability
  • Generation can take 10-60 seconds
  • API costs may apply per image
  • Content policies restrict certain prompts
  • Quality varies by model
  • Some prompts may fail or be filtered

Troubleshooting

Ensure you provide LLM configuration:
image_gen = ImageGen(cfg={
    'llm_cfg': {
        'model': 'qwen-vl-plus',
        'api_key': 'your-key'
    }
})
Increase timeout in agent configuration or implement retry logic:
import time
max_retries = 3
for attempt in range(max_retries):
    try:
        result = image_gen.call(params=...)
        break
    except TimeoutError:
        if attempt < max_retries - 1:
            time.sleep(5)
            continue
        raise
  • Add quality keywords to prompt (“4k”, “high detail”)
  • Try different models
  • Be more specific in descriptions
  • Experiment with style keywords

Code Interpreter

Process generated images with Python

Image Zoom (Qwen3VL)

Zoom into specific regions of images

Assistant Agent

Agent that can generate images

Build docs developers (and LLMs) love