Chat-Aligned Models
Qwen-Chat models are fine-tuned versions of the base Qwen models, aligned with human preferences for conversational interactions. These models are optimized for chatbot applications, content generation, and interactive AI assistants.Overview
Chat models are built on top of base models through supervised fine-tuning (SFT) using the ChatML format:- Qwen-1.8B-Chat, Qwen-7B-Chat, Qwen-14B-Chat, Qwen-72B-Chat
- Aligned with human intent through curated instruction data
- Enhanced safety and service-oriented capabilities
- Support for tool usage, code interpretation, and agent behavior
Fine-tuning Process
Training Data
The alignment dataset includes three major categories:Instruction Data
Instruction Data
Covers broad capabilities for practical applications:
- Writing: Content creation, story generation, copywriting
- Question Answering: Factual queries, explanations, knowledge retrieval
- Brainstorming & Planning: Idea generation, task planning
- Content Understanding: Summarization, analysis, interpretation
- Natural Language Processing: Text manipulation, extraction, transformation
- Coding: Code generation, debugging, explanation
Safety Data
Safety Data
Prevents harmful and inappropriate content generation:
- Refusal of harmful requests
- Bias mitigation
- Safety-aligned responses
- Content filtering
Service Data
Service Data
Enables specific conversation patterns for external system integration:
- Tool invocation protocols
- API calling patterns
- Search integration
- Multi-step reasoning (ReAct)
ChatML Format
Conversations are formatted using ChatML, a meta language for structured dialogue:system: Sets behavior and contextuser: Human inputassistant: Model responses
Training Configuration
- Objective: Causal language modeling (user content tokens excluded from loss)
- Optimizer: AdamW (β₁=0.9, β₂=0.95, ε=10⁻⁶)
- Sequence Length: 2048 tokens
- Batch Size: 128
- Training Steps: 4000
- Learning Rate: Peak 1×10⁻⁵ with 1430-step warm-up
- Regularization: Weight decay 0.1, dropout 0.1, gradient clipping 1.0
Benchmark Performance
Chinese Language Understanding
C-Eval (Zero-shot, generative) - Validation set:| Model | Average Accuracy |
|---|---|
| LLaMA2-7B-Chat | 31.9 |
| LLaMA2-13B-Chat | 40.6 |
| Chinese-Alpaca-Plus-13B | 43.3 |
| Baichuan-13B-Chat | 50.4 |
| ChatGLM2-6B-Chat | 50.7 |
| InternLM-7B-Chat | 53.2 |
| Qwen-7B-Chat | 54.2 |
| Model | Average | STEM | Social Sciences | Humanities | Others |
|---|---|---|---|---|---|
| ChatGLM2-6B-Chat | 50.1 | 46.4 | 60.4 | 50.6 | 46.9 |
| Baichuan-13B-Chat | 51.5 | 43.7 | 64.6 | 56.2 | 49.2 |
| Qwen-7B-Chat | 54.6 | 47.8 | 67.6 | 59.3 | 50.6 |
English Language Understanding
MMLU (Zero-shot):| Model | Average Accuracy |
|---|---|
| ChatGLM2-6B-Chat | 45.5 |
| LLaMA2-7B-Chat | 47.0 |
| InternLM-7B-Chat | 50.8 |
| Baichuan-13B-Chat | 52.1 |
| ChatGLM2-12B-Chat | 52.1 |
| Qwen-7B-Chat | 53.9 |
Coding
HumanEval (Zero-shot Pass@1):| Model | Pass@1 |
|---|---|
| LLaMA2-7B-Chat | 12.2 |
| InternLM-7B-Chat | 14.0 |
| Baichuan-13B-Chat | 16.5 |
| LLaMA2-13B-Chat | 18.9 |
| Qwen-7B-Chat | 24.4 |
Mathematical Reasoning
GSM8K (Math word problems):| Model | Zero-shot | 4-shot |
|---|---|---|
| ChatGLM2-6B-Chat | - | 28.0 |
| LLaMA2-7B-Chat | 20.4 | 28.2 |
| LLaMA2-13B-Chat | 29.4 | 36.7 |
| InternLM-7B-Chat | 32.6 | 34.5 |
| Baichuan-13B-Chat | - | 36.3 |
| ChatGLM2-12B-Chat | - | 38.1 |
| Qwen-7B-Chat | 41.1 | 43.5 |
Tool Usage
Qwen-Chat excels at tool invocation through ReAct prompting: Custom Tool Usage Benchmark:| Model | Tool Selection (Acc.) | Tool Input (Rouge-L) | False Positive Rate |
|---|---|---|---|
| GPT-4 | 95% | 0.90 | 15.0% |
| GPT-3.5 | 85% | 0.88 | 75.0% |
| Qwen-7B-Chat | 99% | 0.89 | 9.7% |
Evaluation plugins do not appear in Qwen’s training data, demonstrating genuine generalization.
| Model | Tool Selection↑ | Tool Used↑ | Code↑ |
|---|---|---|---|
| GPT-4 | 100.00 | 100.00 | 97.41 |
| GPT-3.5 | 95.37 | 96.30 | 87.04 |
| StarCoder-15.5B | 87.04 | 87.96 | 68.89 |
| Qwen-7B-Chat | 90.74 | 92.59 | 74.07 |
Core Capabilities
Conversational AI
Qwen-Chat models excel at multi-turn conversations with context awareness:Tool Integration
Qwen-Chat supports tool usage through ReAct prompting:- ReAct Prompting
- HuggingFace Agent
The model can reason about which tools to use and generate appropriate calls:
Code Interpretation
Chat models can generate, explain, and debug code:System Prompt Enhancement
Qwen-1.8B-Chat and Qwen-72B-Chat have strengthened system prompt capabilities:Quantized Variants
Chat models are available in quantized formats for efficient deployment:Performance Comparison
Qwen-7B-Chat:| Quantization | MMLU | C-Eval (val) | GSM8K | HumanEval |
|---|---|---|---|---|
| BF16 | 55.8 | 59.7 | 50.3 | 37.2 |
| Int8 | 55.4 | 59.4 | 48.3 | 34.8 |
| Int4 | 55.1 | 59.2 | 49.7 | 29.9 |
| Quantization | MMLU | C-Eval (val) | GSM8K | HumanEval |
|---|---|---|---|---|
| BF16 | 64.6 | 69.8 | 60.1 | 43.9 |
| Int8 | 63.6 | 68.6 | 60.0 | 48.2 |
| Int4 | 63.3 | 69.0 | 59.8 | 45.7 |
| Quantization | MMLU | C-Eval (val) | GSM8K | HumanEval |
|---|---|---|---|---|
| BF16 | 74.4 | 80.1 | 76.4 | 64.6 |
| Int8 | 73.5 | 80.1 | 73.5 | 62.2 |
| Int4 | 73.4 | 80.1 | 75.3 | 61.6 |
Quantization causes minimal performance degradation while significantly reducing memory requirements.
Batch Inference
Chat models support batch inference for improved throughput:With Flash Attention enabled, batch inference provides ~40% speedup over sequential processing.
Streaming Responses
Chat models support streaming for real-time response generation:Hardware Requirements
Inference Memory (Generating 2048 tokens)
- Qwen-1.8B-Chat
- Qwen-7B-Chat
- Qwen-14B-Chat
- Qwen-72B-Chat
| Precision | GPU Memory | Speed (tokens/s) |
|---|---|---|
| BF16 | 4.23GB | 54.09 |
| Int8 | 3.48GB | 55.56 |
| Int4 | 2.91GB | 71.07 |
Fine-tuning Memory (Q-LoRA, batch_size=1, gradient_accumulation=8)
| Model Size | Min GPU Memory |
|---|---|
| 1.8B | 5.8GB |
| 7B | 11.5GB |
| 14B | 18.7GB |
| 72B | 61.4GB |
Model Downloads
Qwen-1.8B-Chat
🤗 HF | 🤖 MS | Int4 | Int8
Qwen-7B-Chat
🤗 HF | 🤖 MS | Int4 | Int8
Qwen-14B-Chat
🤗 HF | 🤖 MS | Int4 | Int8
Qwen-72B-Chat
🤗 HF | 🤖 MS | Int4 | Int8
Safety Considerations
Next Steps
Model Selection
Choose the right chat model for your needs
Tool Usage
Learn to integrate external tools
Fine-tuning Chat
Customize chat models for your domain
Deployment
Deploy chat models to production