Overview
The Qwen model loading API provides methods to load pre-trained models and tokenizers from Hugging Face or local paths. All models use the standard Transformers library interfaces.Load Model
Load a Qwen model for causal language modeling:Parameters
Model checkpoint name (e.g., “Qwen/Qwen-7B-Chat”) or local path to model directory
Device allocation strategy:
"auto": Automatically distribute model across available devices"cpu": Load model to CPU only"cuda": Load model to GPU- Dictionary mapping layers to specific devices
Allow execution of custom modeling code from the model repository. Required for Qwen models.
Resume incomplete downloads from Hugging Face Hub
Data type for model weights (e.g.,
torch.float16, torch.bfloat16)Reduce CPU memory usage during model loading (useful for large models)
Configuration for model quantization (4-bit, 8-bit)
Returns
Loaded Qwen model ready for inference or fine-tuning
Load Tokenizer
Load the tokenizer associated with a Qwen model:Parameters
Model checkpoint name or path. Should match the model being loaded.
Allow execution of custom tokenizer code. Required for Qwen models.
Resume incomplete downloads from Hugging Face Hub
Side on which padding tokens are added:
"right": Pad on the right (recommended for training)"left": Pad on the left (recommended for generation)
Use fast tokenizer implementation if available
Maximum sequence length for tokenization
Returns
Loaded tokenizer with special tokens configured for Qwen
Load Generation Config
Load generation configuration from a checkpoint:Parameters
Model checkpoint name or path containing
generation_config.jsonAllow execution of custom configuration code
Resume incomplete downloads
Returns
Generation configuration with model-specific defaults