Overview
Qwen provides both dense and sparse text embedding capabilities using Alibaba Cloud’s DashScope service. Both classes support configurable dimensions and automatic result caching. Location:python/zvec/extension/qwen_embedding_function.py
Installation
QwenDenseEmbedding
Dense text embedding function using Qwen (DashScope) API.Constructor
Parameters
Desired output embedding dimension. Common values:
512: Balanced performance and accuracy1024: Higher accuracy, larger storage1536: Maximum accuracy for supported models
DashScope embedding model identifier. Options:
text-embedding-v4(recommended)text-embedding-v3text-embedding-v2text-embedding-v1
DashScope API authentication key. If
None, reads from DASHSCOPE_API_KEY environment variable. Obtain from: https://dashscope.console.aliyun.com/Additional DashScope API parameters:
text_type(str): Specifies text role -"query"for search queries or"document"for indexed content. Optimizes embeddings for asymmetric search.
Methods
embed()
input(str): Input text string to embed. Maximum length depends on model (typically 2048-8192 tokens).
DenseVectorType: List of floats representing the embedding vector. Length equalsself.dimension.
TypeError: If input is not a stringValueError: If input is empty or API returns errorRuntimeError: If network or DashScope service errors occur
Usage Examples
Basic Usage
Specific Model
Asymmetric Retrieval
QwenSparseEmbedding
Sparse text embedding function using Qwen (DashScope) API. Generates sparse keyword-weighted vectors suitable for lexical matching and BM25-style retrieval.Constructor
Parameters
Desired output embedding dimension. Common values:
512: Balanced performance1024: Higher accuracy1536: Maximum accuracy
DashScope embedding model identifier.
DashScope API key or None to use environment variable.
Additional DashScope API parameters:
encoding_type(Literal[“query”, “document”]): Encoding type"query": Optimize for search queries (default)"document": Optimize for indexed documents
Methods
embed()
input(str): Input text string to embed.
SparseVectorType: Dictionary mapping dimension index to weight. Only non-zero dimensions included. Sorted by indices for consistency.
TypeError: If input is not a stringValueError: If input is empty or API returns errorRuntimeError: If network or service errors occur
Usage Examples
Basic Usage
Document Embedding
Asymmetric Retrieval
Inspecting Sparse Dimensions
Hybrid Retrieval
Combine dense and sparse embeddings for optimal search:Best Practices
Asymmetric Search: Use
text_type="query" for queries and text_type="document" for documents to optimize retrieval accuracy.Comparison: Dense vs Sparse
| Feature | QwenDenseEmbedding | QwenSparseEmbedding |
|---|---|---|
| Output Format | List of floats | Dictionary (sparse) |
| Typical Size | 512-1536 dimensions (all) | ~150-200 non-zero dimensions |
| Best For | Semantic similarity | Keyword matching |
| Memory | Fixed size | Variable, efficient |
| Interpretability | Low | High (terms visible) |
| Use Case | General retrieval | Lexical search, hybrid |
Error Handling
Notes
- Requires Python 3.10, 3.11, or 3.12
- Requires
dashscopepackage:pip install dashscope - Results are cached (LRU cache, maxsize=10)
- Network connectivity required
- API costs may apply
- Sparse vectors are sorted by indices for consistency