Qwen Model Family
Qwen (通义千问) is a series of large language models developed by Alibaba Cloud. The family includes both base pretrained models and chat-aligned models across multiple parameter scales.Available Models
The Qwen series consists of four main model sizes, each available in both base and chat variants:Qwen-1.8B
Compact model for efficient deployment
- 1.8 billion parameters
- 32K context length
- 2.2T training tokens
Qwen-7B
Balanced model for general use
- 7 billion parameters
- 32K context length
- 2.4T training tokens
Qwen-14B
Enhanced performance model
- 14 billion parameters
- 8K context length
- 3.0T training tokens
Qwen-72B
Flagship large-scale model
- 72 billion parameters
- 32K context length
- 3.0T training tokens
Model Variants
Base Models
Base models are pretrained language models suitable for further fine-tuning:- Qwen-1.8B, Qwen-7B, Qwen-14B, Qwen-72B
- Pretrained on up to 3 trillion tokens of multilingual data
- Focus on Chinese and English languages
- Support for various domains: web documents, code, mathematics, reasoning
Chat Models
Chat models are aligned with human preferences using supervised fine-tuning:- Qwen-1.8B-Chat, Qwen-7B-Chat, Qwen-14B-Chat, Qwen-72B-Chat
- Fine-tuned for conversational interactions
- Enhanced safety and service-oriented capabilities
- Support for tool usage and code interpretation
Quantized Models
Quantized versions reduce memory requirements while maintaining performance:- Int4
- Int8
4-bit quantization - Lowest memory footprint
- Qwen-1.8B-Chat-Int4
- Qwen-7B-Chat-Int4
- Qwen-14B-Chat-Int4
- Qwen-72B-Chat-Int4
Performance Overview
Qwen models demonstrate competitive performance across multiple benchmarks:Qwen-72B Benchmark Results
| Benchmark | Qwen-72B | LLaMA2-70B | GPT-3.5 |
|---|---|---|---|
| MMLU | 77.4 | 69.8 | 70.0 |
| C-Eval | 83.3 | - | 54.4 |
| GSM8K | 78.9 | 56.8 | 57.1 |
| HumanEval | 35.4 | 29.9 | 48.1 |
| MATH | 35.2 | 13.5 | 34.1 |
| BBH | 67.7 | 51.2 | 70.0 |
Qwen-72B achieves better performance than LLaMA2-70B on all tasks and outperforms GPT-3.5 on 7 out of 10 benchmark tasks.
Key Features
Multilingual Support
- Primary Languages: Chinese and English
- Extended Support: Japanese, Korean, Arabic, Thai, Vietnamese, and more
- Vocabulary Size: 151,851 tokens (optimized for multilingual efficiency)
- Tokenizer: Based on tiktoken with efficient number encoding
Long Context Support
Qwen models support extended context lengths through multiple techniques:- Dynamic NTK-aware interpolation: Extends context beyond training length
- LogN attention scaling: Improves long-sequence performance
- Local window attention: Reduces memory usage for very long contexts
- Context Extension: From 2048 to 32K tokens (model-dependent)
Architecture Highlights
- Base Architecture: Transformer decoder-only (similar to LLaMA)
- Positional Encoding: Rotary Position Embedding (RoPE)
- Activation Function: SwiGLU
- Normalization: RMSNorm
- Attention: Flash Attention 2 support for efficient training
- Embedding: Untied input/output embeddings
Model Availability
Hugging Face
Access models on Hugging Face Hub
ModelScope
Download from ModelScope (optimized for China)
Quick Comparison
| Model | Release Date | Context | System Prompt | Training Tokens | Min GPU (Q-LoRA) | Min GPU (Int4) |
|---|---|---|---|---|---|---|
| Qwen-1.8B | 2023-11-30 | 32K | ✅ | 2.2T | 5.8GB | 2.9GB |
| Qwen-7B | 2023-08-03 | 32K | ❌ | 2.4T | 11.5GB | 8.2GB |
| Qwen-14B | 2023-09-25 | 8K | ❌ | 3.0T | 18.7GB | 13.0GB |
| Qwen-72B | 2023-11-30 | 32K | ✅ | 3.0T | 61.4GB | 48.9GB |
System Prompt Enhancement: Qwen-1.8B and Qwen-72B have strengthened system prompt capabilities for better instruction following.
Next Steps
Base Models
Explore base pretrained models for fine-tuning
Chat Models
Learn about conversation-aligned models
Model Selection
Choose the right model for your use case
Getting Started
Start using Qwen models