Skip to main content

Qwen Model Family

Qwen (通义千问) is a series of large language models developed by Alibaba Cloud. The family includes both base pretrained models and chat-aligned models across multiple parameter scales.

Available Models

The Qwen series consists of four main model sizes, each available in both base and chat variants:

Qwen-1.8B

Compact model for efficient deployment
  • 1.8 billion parameters
  • 32K context length
  • 2.2T training tokens

Qwen-7B

Balanced model for general use
  • 7 billion parameters
  • 32K context length
  • 2.4T training tokens

Qwen-14B

Enhanced performance model
  • 14 billion parameters
  • 8K context length
  • 3.0T training tokens

Qwen-72B

Flagship large-scale model
  • 72 billion parameters
  • 32K context length
  • 3.0T training tokens

Model Variants

Base Models

Base models are pretrained language models suitable for further fine-tuning:
  • Qwen-1.8B, Qwen-7B, Qwen-14B, Qwen-72B
  • Pretrained on up to 3 trillion tokens of multilingual data
  • Focus on Chinese and English languages
  • Support for various domains: web documents, code, mathematics, reasoning

Chat Models

Chat models are aligned with human preferences using supervised fine-tuning:
  • Qwen-1.8B-Chat, Qwen-7B-Chat, Qwen-14B-Chat, Qwen-72B-Chat
  • Fine-tuned for conversational interactions
  • Enhanced safety and service-oriented capabilities
  • Support for tool usage and code interpretation

Quantized Models

Quantized versions reduce memory requirements while maintaining performance:
4-bit quantization - Lowest memory footprint
  • Qwen-1.8B-Chat-Int4
  • Qwen-7B-Chat-Int4
  • Qwen-14B-Chat-Int4
  • Qwen-72B-Chat-Int4
Memory Usage: ~50-70% reduction vs BF16

Performance Overview

Qwen models demonstrate competitive performance across multiple benchmarks:

Qwen-72B Benchmark Results

BenchmarkQwen-72BLLaMA2-70BGPT-3.5
MMLU77.469.870.0
C-Eval83.3-54.4
GSM8K78.956.857.1
HumanEval35.429.948.1
MATH35.213.534.1
BBH67.751.270.0
Qwen-72B achieves better performance than LLaMA2-70B on all tasks and outperforms GPT-3.5 on 7 out of 10 benchmark tasks.

Key Features

Multilingual Support

  • Primary Languages: Chinese and English
  • Extended Support: Japanese, Korean, Arabic, Thai, Vietnamese, and more
  • Vocabulary Size: 151,851 tokens (optimized for multilingual efficiency)
  • Tokenizer: Based on tiktoken with efficient number encoding

Long Context Support

Qwen models support extended context lengths through multiple techniques:
  • Dynamic NTK-aware interpolation: Extends context beyond training length
  • LogN attention scaling: Improves long-sequence performance
  • Local window attention: Reduces memory usage for very long contexts
  • Context Extension: From 2048 to 32K tokens (model-dependent)

Architecture Highlights

  • Base Architecture: Transformer decoder-only (similar to LLaMA)
  • Positional Encoding: Rotary Position Embedding (RoPE)
  • Activation Function: SwiGLU
  • Normalization: RMSNorm
  • Attention: Flash Attention 2 support for efficient training
  • Embedding: Untied input/output embeddings

Model Availability

Hugging Face

Access models on Hugging Face Hub

ModelScope

Download from ModelScope (optimized for China)

Quick Comparison

ModelRelease DateContextSystem PromptTraining TokensMin GPU (Q-LoRA)Min GPU (Int4)
Qwen-1.8B2023-11-3032K2.2T5.8GB2.9GB
Qwen-7B2023-08-0332K2.4T11.5GB8.2GB
Qwen-14B2023-09-258K3.0T18.7GB13.0GB
Qwen-72B2023-11-3032K3.0T61.4GB48.9GB
System Prompt Enhancement: Qwen-1.8B and Qwen-72B have strengthened system prompt capabilities for better instruction following.

Next Steps

Base Models

Explore base pretrained models for fine-tuning

Chat Models

Learn about conversation-aligned models

Model Selection

Choose the right model for your use case

Getting Started

Start using Qwen models

Build docs developers (and LLMs) love