Welcome to Qwen

Qwen (通义千问) is a series of open-source large language models developed by Alibaba Cloud. The models range from 1.8B to 72B parameters and include both pretrained base models and chat-aligned variants optimized for conversational AI.

Quickstart

Get started with Qwen in minutes

Model Selection

Choose the right model for your use case

Fine-tuning

Customize models for your domain

API Reference

Explore the complete API

Key Features

Multiple Model Sizes

Choose from 1.8B, 7B, 14B, and 72B parameter models to balance performance and efficiency

Quantization Support

Run models efficiently with GPTQ Int4/Int8 and KV cache quantization

Long Context

Process up to 32K tokens in a single context window

Function Calling

Enable tool use and agent capabilities with built-in function calling

Multi-language

Strong performance on both Chinese and English tasks

OpenAI Compatible

Deploy with OpenAI-compatible API endpoints

Model Variants

Qwen offers two main types of models:

Base Models
Chat Models

Qwen-1.8B, Qwen-7B, Qwen-14B, Qwen-72BPretrained language models trained on over 2.2-3.0 trillion tokens. Ideal for:

Further fine-tuning on domain-specific data
Research and experimentation
Building custom chat applications

Performance Highlights

Qwen-72B achieves state-of-the-art performance among open-source models:

MMLU: 77.4% (outperforms LLaMA2-70B and GPT-3.5)
C-Eval: 83.3% (leading performance on Chinese benchmarks)
GSM8K: 78.9% (strong mathematical reasoning)
HumanEval: 35.4% (competitive coding capabilities)

For detailed benchmark results and methodology, see the Benchmarks page and Technical Report.

Getting Started

Install Dependencies

Install the required packages:

pip install transformers>=4.32.0 torch>=2.0.0

Load a Model

Load and use a Qwen model in just a few lines:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()

response, history = model.chat(tokenizer, "Hello!", history=None)
print(response)

Explore Advanced Features

Discover quantization, fine-tuning, and deployment options in the documentation

Next Steps

Installation Guide

Set up your environment and install Qwen

Model Overview

Compare model sizes and capabilities

Quantization

Optimize memory and speed with quantization

Deployment

Deploy Qwen in production environments

Community and Support

GitHub Repository

View source code and contribute

FAQ

Find answers to common questions

Note: This repository (QwenLM/Qwen) focuses on the first generation of Qwen models. For Qwen2, please visit QwenLM/Qwen2.

Getting Started

Models

Inference

Quantization

Fine-tuning

Advanced Features

Deployment

Demos

Introduction

Welcome to Qwen

Quickstart

Model Selection

Fine-tuning

API Reference

Key Features

Multiple Model Sizes

Quantization Support

Long Context

Function Calling

Multi-language

OpenAI Compatible

Model Variants

Performance Highlights

Getting Started

Next Steps

Installation Guide

Model Overview

Quantization

Deployment

Community and Support

GitHub Repository

FAQ

Build docs developers (and LLMs) love

Getting Started

Models

Inference

Quantization

Fine-tuning

Advanced Features

Deployment

Demos

​Welcome to Qwen

Quickstart

Model Selection

Fine-tuning

API Reference

​Key Features

Multiple Model Sizes

Quantization Support

Long Context

Function Calling

Multi-language

OpenAI Compatible

​Model Variants

​Performance Highlights

​Getting Started

​Next Steps

Installation Guide

Model Overview

Quantization

Deployment

​Community and Support

GitHub Repository

FAQ

Build docs developers (and LLMs) love

Welcome to Qwen

Key Features

Model Variants

Performance Highlights

Getting Started

Next Steps

Community and Support