ModelScope is an open-source platform for Model-as-a-Service (MaaS), providing flexible and cost-effective model services to AI developers. Qwen models are fully integrated with ModelScope, offering an alternative to Hugging Face for model hosting and inference.
Download models from ModelScope and use them with Transformers:
1
Download Model
Use snapshot_download to fetch the model:
from modelscope import snapshot_download# Download to local directorymodel_dir = snapshot_download('qwen/Qwen-7B-Chat')print(f"Model downloaded to: {model_dir}")
Available models:
qwen/Qwen-1_8B-Chat
qwen/Qwen-7B-Chat
qwen/Qwen-14B-Chat
qwen/Qwen-72B-Chat
2
Load with Transformers
Load the downloaded model using Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer# Load from local directorytokenizer = AutoTokenizer.from_pretrained( model_dir, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained( model_dir, device_map="auto", trust_remote_code=True).eval()
3
Run Inference
Use the model as normal:
response, history = model.chat(tokenizer, "你好", history=None)print(response)
Here’s a complete workflow downloading from ModelScope and using with Transformers:
from modelscope import snapshot_downloadfrom transformers import AutoModelForCausalLM, AutoTokenizer# Step 1: Download model checkpoint to local directorymodel_dir = snapshot_download('qwen/Qwen-14B-Chat')# Step 2: Load model and tokenizer from local directory# trust_remote_code is still required for custom model codetokenizer = AutoTokenizer.from_pretrained( model_dir, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained( model_dir, device_map="auto", trust_remote_code=True).eval()# Step 3: Run inferenceresponse, history = model.chat( tokenizer, "请介绍一下中国的历史", history=None)print(response)response, history = model.chat( tokenizer, "能详细说说唐朝吗?", history=history)print(response)
Download from ModelScope but load config from Hugging Face:
from modelscope import snapshot_downloadfrom transformers import AutoModelForCausalLM, AutoTokenizerfrom transformers.generation import GenerationConfig# Download model from ModelScopemodel_dir = snapshot_download('qwen/Qwen-7B-Chat')# Load model and tokenizer locallytokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained( model_dir, device_map="auto", trust_remote_code=True).eval()# But load generation config from Hugging Face (if you prefer)model.generation_config = GenerationConfig.from_pretrained( "Qwen/Qwen-7B-Chat", # Note: Capital Q for HF trust_remote_code=True)response, history = model.chat(tokenizer, "你好", history=None)print(response)