Overview
Image Generation provides:- Text-to-Image: Convert text descriptions to images
- LLM Integration: Uses configured LLM for generation
- Flexible Sizing: Configurable output dimensions
- Prompt Preservation: Maintains detailed requirements from user input
- Content Items: Returns images as ContentItem objects
Registration
image_gen
Parameters
Detailed description of the desired image content. Should include:
- Main subject
- Style or artistic direction
- Colors and mood
- Composition details
- Any specific requirements or text to include
Parameter Schema
Configuration
Configuration for the image generation LLM. Must include model settings.Example:
Output image dimensions. Format:
"width*height"Common sizes:"1024*1024"- Square"1024*768"- Landscape"768*1024"- Portrait"1920*1080"- HD Landscape
Usage
Basic Image Generation
Custom Image Size
Using with Agents
Return Format
The tool returns a list ofContentItem objects:
ContentItem contains:
- image: URL or base64-encoded image data
- May include additional metadata depending on the LLM
Example: Custom Image Generation Tool
Example: Image Generation Agent
Prompt Engineering Tips
Good Prompts
Good Prompts
Include specific details:Specify style:
Avoid Vague Prompts
Avoid Vague Prompts
Too vague ❌:
- “a dog”
- “something nice”
- “a picture”
- “A corgi puppy wearing a red bandana, sitting on green grass”
- “A peaceful zen garden with raked sand and stone arrangements”
- “A portrait of an elderly man with kind eyes, soft studio lighting”
Composition Keywords
Composition Keywords
- Angle: “aerial view”, “close-up”, “wide angle”
- Lighting: “golden hour”, “dramatic lighting”, “soft diffused light”
- Style: “photorealistic”, “watercolor”, “digital art”, “sketch”
- Mood: “serene”, “energetic”, “mysterious”, “cheerful”
- Quality: “4k”, “8k”, “high detail”, “cinematic”
Multi-Language Support
Multi-Language Support
Some models support prompts in multiple languages:
Supported Models
Depends on your LLM configuration. Common options:Qwen Models (via DashScope)
OpenAI DALL-E
Other Compatible Models
Any model that:- Accepts text prompts via chat interface
- Returns images as ContentItem objects
- Is supported by Qwen-Agent’s LLM interface
Advanced Usage
Batch Generation
With Code Interpreter
Error Handling
Best Practices
Prompt Quality
Prompt Quality
- Be specific and detailed
- Include style/mood descriptors
- Specify quality/resolution keywords
- Test different phrasings for best results
Resource Management
Resource Management
- Image generation can be slow (10-60 seconds)
- Consider caching generated images
- Implement timeouts for agent workflows
- Monitor API usage and costs
Content Safety
Content Safety
- Most models have built-in content filters
- Respect usage policies
- Don’t attempt to bypass safety features
- Review generated content before sharing
Limitations
Troubleshooting
ValueError: llm_cfg is required
ValueError: llm_cfg is required
Ensure you provide LLM configuration:
Generation timeout
Generation timeout
Increase timeout in agent configuration or implement retry logic:
Low quality results
Low quality results
- Add quality keywords to prompt (“4k”, “high detail”)
- Try different models
- Be more specific in descriptions
- Experiment with style keywords
Related
Code Interpreter
Process generated images with Python
Image Zoom (Qwen3VL)
Zoom into specific regions of images
Assistant Agent
Agent that can generate images