Overview
The CLI demo (cli_demo.py) offers an interactive chat experience directly in your terminal with features including:
- Real-time streaming responses
- Conversation history management
- Dynamic generation configuration
- Random seed control for reproducibility
- CPU-only mode support
Installation
Basic Usage
Quick Start
Run the demo with default settings (Qwen-7B-Chat):Command-Line Options
The CLI demo supports the following arguments:Model checkpoint name or path from HuggingFace/ModelScope
Random seed for reproducible generation
Run the demo with CPU only (no GPU required)
Usage Examples
Interactive Commands
Once the demo is running, you can use these special commands:Help and Information
| Command | Aliases | Description |
|---|---|---|
:help | :h | Display all available commands |
:history | :his | Show conversation history |
:conf | - | Show current generation configuration |
:seed | - | Show current random seed |
Session Management
| Command | Aliases | Description |
|---|---|---|
:clear | :cl | Clear the screen |
:clear-his | :clh | Clear conversation history |
:exit | :quit, :q | Exit the demo |
Configuration
Configuration changes persist for the current session only.
Common Configuration Parameters
Random Seed Control
Check Current Seed:Example Session
Here’s what a typical interaction looks like:cli_demo.py:19
Features
Streaming Responses
The CLI demo usesmodel.chat_stream() method to provide real-time streaming responses:
cli_demo.py:198
History Management
Conversations are automatically tracked:cli_demo.py:206
- View all previous exchanges with
:history - Clear history with
:clear-histo start fresh - History is preserved across multiple turns
Keyboard Interrupt Handling
PressCtrl+C during generation to interrupt:
cli_demo.py:202
Performance Tips
Memory Optimization
The demo automatically manages memory:cli_demo.py:68
- Clearing the screen (
:clear) - Clearing history (
:clear-his)
Device Selection
Troubleshooting
Model loading is slow
Model loading is slow
First-time model loading downloads the model from HuggingFace/ModelScope. This can take time depending on your connection. Subsequent runs will use the cached model.Consider downloading the model manually:
Out of memory errors
Out of memory errors
Try these solutions:
- Use a smaller model (e.g., Qwen-1.8B-Chat instead of Qwen-7B-Chat)
- Enable CPU-only mode with
--cpu-only - Use quantized models (Int4 or Int8 versions)
- Clear history frequently with
:clear-his
Unicode decode errors
Unicode decode errors
The demo handles encoding errors automatically:If issues persist, check your terminal’s encoding settings.
cli_demo.py:95
Generation produces unexpected results
Generation produces unexpected results
Try:
- Adjusting temperature:
:conf temperature=0.7(lower = more focused) - Changing the random seed:
:seed 42 - Resetting config:
:reset-conf - Clearing history if context is confusing:
:clear-his
Source Code Reference
The CLI demo implementation can be found atcli_demo.py:1 in the Qwen repository.
Key components:
- Model loading:
cli_demo.py:44 - Main loop:
cli_demo.py:105 - Command processing:
cli_demo.py:128 - Chat streaming:
cli_demo.py:198
Next Steps
Web Demo
Try the Gradio-based web interface
Model API
Integrate Qwen into your applications