Skip to main content
The CLI demo provides a powerful command-line interface for interacting with Qwen-Chat models. This demo supports multi-turn conversations with history management, streaming responses, and configurable generation parameters.

Overview

The CLI demo (cli_demo.py) offers an interactive chat experience directly in your terminal with features including:
  • Real-time streaming responses
  • Conversation history management
  • Dynamic generation configuration
  • Random seed control for reproducibility
  • CPU-only mode support

Installation

1

Install Dependencies

Make sure you have the required packages installed:
pip install torch transformers transformers_stream_generator
2

Optional: Install Flash Attention

For improved performance (GPU only):
pip install flash-attn --no-build-isolation

Basic Usage

Quick Start

Run the demo with default settings (Qwen-7B-Chat):
python cli_demo.py

Command-Line Options

The CLI demo supports the following arguments:
-c, --checkpoint-path
string
default:"Qwen/Qwen-7B-Chat"
Model checkpoint name or path from HuggingFace/ModelScope
-s, --seed
integer
default:"1234"
Random seed for reproducible generation
--cpu-only
flag
Run the demo with CPU only (no GPU required)

Usage Examples

# Use Qwen-14B-Chat
python cli_demo.py -c Qwen/Qwen-14B-Chat

# Use Qwen-1.8B-Chat
python cli_demo.py -c Qwen/Qwen-1_8B-Chat

# Use local model path
python cli_demo.py -c /path/to/your/model

Interactive Commands

Once the demo is running, you can use these special commands:

Help and Information

CommandAliasesDescription
:help:hDisplay all available commands
:history:hisShow conversation history
:conf-Show current generation configuration
:seed-Show current random seed

Session Management

CommandAliasesDescription
:clear:clClear the screen
:clear-his:clhClear conversation history
:exit:quit, :qExit the demo

Configuration

Configuration changes persist for the current session only.
View Configuration:
:conf
Modify Configuration:
:conf <key>=<value>
Reset to Default:
:reset-conf

Common Configuration Parameters

:conf temperature=0.8

Random Seed Control

Check Current Seed:
:seed
Set New Seed:
:seed 42

Example Session

Here’s what a typical interaction looks like:
cli_demo.py:19
Welcome to use Qwen-Chat model, type text to start chat, type :h to show command help.
(欢迎使用 Qwen-Chat 模型,输入内容即可进行对话,:h 显示命令帮助。)

Note: This demo is governed by the original license of Qwen.
We strongly advise users not to knowingly generate or allow others to knowingly generate harmful content.

User> Hello! Can you introduce yourself?

Qwen-Chat: Hello! I'm Qwen, a large language model created by Alibaba Cloud. 
I'm designed to understand and generate human-like text, answer questions, 
provide information, and assist with various tasks. How can I help you today?

User> :seed 999
[INFO] Random seed changed to 999

User> :conf temperature=0.7
[INFO] Change config: model.generation_config.temperature = 0.7

User> :history
================ History (1) ================
User[0]: Hello! Can you introduce yourself?
QWen[0]: Hello! I'm Qwen, a large language model...
=============================================

User> :clear-his
[INFO] All 1 history cleared

User> :exit

Features

Streaming Responses

The CLI demo uses model.chat_stream() method to provide real-time streaming responses:
cli_demo.py:198
for response in model.chat_stream(tokenizer, query, history=history, generation_config=config):
    _clear_screen()
    print(f"\nUser: {query}")
    print(f"\nQwen-Chat: {response}")
This creates a smooth, interactive experience where you see the response being generated token by token.

History Management

Conversations are automatically tracked:
cli_demo.py:206
history.append((query, response))
You can:
  • View all previous exchanges with :history
  • Clear history with :clear-his to start fresh
  • History is preserved across multiple turns

Keyboard Interrupt Handling

Press Ctrl+C during generation to interrupt:
cli_demo.py:202
try:
    for response in model.chat_stream(...):
        # streaming...
except KeyboardInterrupt:
    print('[WARNING] Generation interrupted')
    continue

Performance Tips

GPU Memory Management: The demo includes automatic garbage collection and CUDA cache clearing when you clear history or screen.

Memory Optimization

The demo automatically manages memory:
cli_demo.py:68
def _gc():
    import gc
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
Memory is cleared when:
  • Clearing the screen (:clear)
  • Clearing history (:clear-his)

Device Selection

# Automatically uses available GPUs
model = AutoModelForCausalLM.from_pretrained(
    args.checkpoint_path,
    device_map="auto",
    trust_remote_code=True
).eval()

Troubleshooting

First-time model loading downloads the model from HuggingFace/ModelScope. This can take time depending on your connection. Subsequent runs will use the cached model.Consider downloading the model manually:
from modelscope import snapshot_download
model_dir = snapshot_download('qwen/Qwen-7B-Chat')
Try these solutions:
  1. Use a smaller model (e.g., Qwen-1.8B-Chat instead of Qwen-7B-Chat)
  2. Enable CPU-only mode with --cpu-only
  3. Use quantized models (Int4 or Int8 versions)
  4. Clear history frequently with :clear-his
The demo handles encoding errors automatically:
cli_demo.py:95
except UnicodeDecodeError:
    print('[ERROR] Encoding error in input')
    continue
If issues persist, check your terminal’s encoding settings.
Try:
  1. Adjusting temperature: :conf temperature=0.7 (lower = more focused)
  2. Changing the random seed: :seed 42
  3. Resetting config: :reset-conf
  4. Clearing history if context is confusing: :clear-his

Source Code Reference

The CLI demo implementation can be found at cli_demo.py:1 in the Qwen repository. Key components:
  • Model loading: cli_demo.py:44
  • Main loop: cli_demo.py:105
  • Command processing: cli_demo.py:128
  • Chat streaming: cli_demo.py:198

Next Steps

Web Demo

Try the Gradio-based web interface

Model API

Integrate Qwen into your applications

Build docs developers (and LLMs) love