Skip to main content
Track the development history and major updates to the Qwen model series.

2023.11.30

Major Release: Qwen-72B and Qwen-1.8B

Qwen-72B Series

New Models Released:
  • Qwen-72B (base model)
  • Qwen-72B-Chat (chat model)
Key Features:
  • Trained on 3T tokens
  • 32K context length support
  • State-of-the-art performance across benchmarks
  • Outperforms LLaMA2-70B on all tasks
  • Surpasses GPT-3.5 on 7 out of 10 benchmarks

Qwen-1.8B Series

New Models Released:
  • Qwen-1.8B (base model)
  • Qwen-1.8B-Chat (chat model)
Key Features:
  • Efficient small model for edge deployment
  • 32K context length
  • Competitive performance for its size
  • Ideal for resource-constrained scenarios

Enhanced Features

System Prompt Enhancement:
  • Strengthened system prompt capabilities for Qwen-72B-Chat
  • Improved system prompt capabilities for Qwen-1.8B-Chat
  • Better instruction following and role-playing
  • See system prompt examples
Hardware Support Expansion:
  • Ascend 910 inference support
  • Hygon DCU inference support
  • Check ascend-support and dcu-support branches for details
Model Availability:
  • Available on ModelScope and Hugging Face
  • Int4 and Int8 quantized versions included

2023.10.17

Int8 Quantization Release

New Models:
  • Qwen-7B-Chat-Int8
  • Qwen-14B-Chat-Int8
Benefits:
  • Reduced memory footprint compared to BF16
  • Faster inference than BF16
  • Minimal performance degradation
  • Balance between Int4 speed and BF16 quality

2023.09.25

Major Release: Qwen-14B

Qwen-14B Series

New Models Released:
  • Qwen-14B (base model)
  • Qwen-14B-Chat (chat model)
Specifications:
  • 14 billion parameters
  • Trained on 3.0T tokens
  • 8K context length
  • Superior performance vs similar-sized models

Companion Projects

qwen.cpp: Qwen-Agent:

Qwen-7B Updates

IMPORTANT: Qwen-7B has been significantly updated. Please pull the latest version!
Improvements:
  • Increased training tokens: 2.2T → 2.4T
  • Extended context length: 2048 → 8192 tokens
  • Enhanced Chinese knowledge
  • Improved coding capabilities
  • Better overall performance

2023.09.12

Fine-tuning Support

New Capabilities:
  • Full-parameter fine-tuning
  • LoRA (Low-Rank Adaptation)
  • Q-LoRA (Quantized LoRA)
Scripts Provided:
  • finetune/finetune_ds.sh - Full-parameter with DeepSpeed
  • finetune/finetune_lora_single_gpu.sh - LoRA on single GPU
  • finetune/finetune_lora_ds.sh - LoRA with DeepSpeed
  • finetune/finetune_qlora_single_gpu.sh - Q-LoRA on single GPU
  • finetune/finetune_qlora_ds.sh - Q-LoRA with DeepSpeed
Features:
  • DeepSpeed ZeRO integration
  • Custom data format support
  • Mixed precision training (BF16/FP16)
  • Gradient checkpointing
  • Multi-GPU and multi-node support

2023.08.21

Int4 Quantization Release

New Model:
  • Qwen-7B-Chat-Int4
Features:
  • Extremely low memory requirements
  • Improved inference speed vs BF16
  • ~8GB GPU memory for inference
  • Minimal quality loss on benchmarks
  • Based on AutoGPTQ
Performance:
  • Speed: ~50 tokens/s (vs ~40 for BF16)
  • Memory: ~8.2GB (vs ~17GB for BF16)
  • Benchmark scores within 1-2% of BF16

2023.08.03

Initial Release: Qwen-7B

First Public Release

Models Released:
  • Qwen-7B (base pretrained model)
  • Qwen-7B-Chat (chat-aligned model)
Platforms:
  • ModelScope
  • Hugging Face
Base Model (Qwen-7B):
  • 7 billion parameters
  • Trained on 2.2T tokens
  • 2048 context length
  • Transformer decoder-only architecture
  • Focus on Chinese and English
Chat Model (Qwen-7B-Chat):
  • Fine-tuned from Qwen-7B
  • Aligned with human preferences via SFT
  • ChatML format support
  • Tool usage capabilities
  • Code interpreter functionality
Technical Report:
  • Published on arXiv: 2309.16609
  • Detailed methodology and evaluation
  • Comprehensive benchmark results
Key Features:
  • Competitive benchmark performance
  • Multilingual tokenizer (151,851 tokens)
  • BPE tokenization with tiktoken
  • Tool usage and ReAct prompting
  • Long-context inference (NTK, LogN)
  • Flash Attention support
  • Multiple deployment options

Model Comparison Timeline

DateModelParametersContextTokens TrainedKey Features
2023.08.03Qwen-7B7B2K2.2TInitial release
2023.08.21Qwen-7B-Chat-Int47B2K-First quantized model
2023.09.12----Fine-tuning scripts
2023.09.25Qwen-7B (v2)7B8K2.4TUpdated with more data
2023.09.25Qwen-14B14B8K3.0TLarger model release
2023.10.17Qwen-*B-Chat-Int8Various--Int8 quantization
2023.11.30Qwen-72B72B32K3.0TFlagship model
2023.11.30Qwen-1.8B1.8B32K2.2TEfficient small model

Feature Evolution

Tokenization

  • 2023.08.03: Initial tiktoken-based tokenizer (151,851 tokens)
  • 2023.10.08: Added extra_vocab_file support for vocabulary expansion
  • Ongoing: Injection attack prevention with allowed_special and disallowed_special

Context Length

  • 2023.08.03: 2K context (Qwen-7B original)
  • 2023.09.25: 8K context (Qwen-7B v2, Qwen-14B)
  • 2023.11.30: 32K context (Qwen-72B, Qwen-1.8B)
  • Training-free extension: Up to 16K+ with NTK and LogN

Quantization

  • 2023.08.21: Int4 quantization (AutoGPTQ)
  • 2023.10.17: Int8 quantization
  • KV Cache: Int8 quantization for attention cache
  • Maintained: Minimal performance degradation

Fine-tuning

  • 2023.09.12: Full-parameter, LoRA, Q-LoRA support
  • DeepSpeed: ZeRO 2 and ZeRO 3 integration
  • Multi-node: Support for distributed training
  • Optimizations: Gradient checkpointing, mixed precision

Tool Usage

  • 2023.08.03: ReAct prompting support
  • 2023.09.25: Qwen-Agent framework release
  • HuggingFace Agent: Official compatibility
  • Plugin system: Extensible tool integration

Benchmark Improvements

Qwen-7B Evolution

Benchmarkv1 (2.2T)v2 (2.4T)Improvement
MMLU56.758.2+1.5
C-Eval59.663.5+3.9
GSM8K51.651.7+0.1
HumanEval24.429.9+5.5
MBPP-31.6-

Model Size Comparison (Latest Versions)

ModelMMLUC-EvalGSM8KHumanEval
Qwen-1.8B45.356.132.315.2
Qwen-7B58.263.551.729.9
Qwen-14B66.372.161.332.3
Qwen-72B77.483.378.935.4

Documentation Updates

  • 2023.08: Initial README and technical report
  • 2023.09: Fine-tuning guides and examples
  • 2023.10: Tokenization deep-dive documentation
  • 2023.11: System prompt examples and hardware guides
  • Ongoing: Community contributions and translations

Community Contributions

Third-Party Integrations

  • FastChat support
  • Firefly integration
  • LLaMA Efficient Tuning compatibility
  • OpenVINO toolkit support
  • vLLM deployment option

Translations

  • Chinese (中文)
  • Japanese (日本語)
  • French (Français)
  • Spanish (Español)

Upcoming Features

These are planned features and may be subject to change.
  • RLHF: Reinforcement Learning from Human Feedback
  • Qwen2: Next generation models (now available in separate repo)
  • Multimodal: Vision and audio capabilities
  • Extended context: Further context length improvements
  • Additional sizes: More model size options

Version Compatibility

ComponentMinimumRecommended
Python3.83.10+
PyTorch1.122.0+
Transformers4.32.04.35.0+
CUDA11.411.8+
Flash Attention2.02.3+

Model File Compatibility

Older model files may not be compatible with newer code. Always use matching versions or update both.
Breaking changes:
  • 2023.09.25: Qwen-7B model update (incompatible with old checkpoints)
  • 2023.11.30: Configuration changes for 32K context models

Deprecation Notices

Qwen (Legacy Repository)

The original Qwen repository is no longer actively maintained.
Migration path:
  • For new projects, use Qwen2
  • Existing Qwen models remain available and supported
  • Documentation and issue tracking continue for legacy models

Old Utilities

  • utils.py multi-GPU loading: Deprecated in favor of device_map="auto"
  • Manual merge files: Use extra_vocab_file parameter instead

Release Notes Archive

For detailed release notes and discussion:

Stay Updated

GitHub

Watch the repository for updates

Discord

Join community discussions

Twitter/X

Follow @QwenLM for announcements

Paper

Read the technical report

Build docs developers (and LLMs) love