Overview
Watercooler supports:- Embedding models - GGUF files for llama.cpp (bge-m3, nomic-embed-text, e5-mistral-7b)
- LLM models - GGUF files for llama-server with response field configuration (Qwen3, Llama 3.2, SmolLM2, Qwen2.5, Phi-3)
Embedding Models
resolve_embedding_model
Resolve a friendly model name to its full specification.name(str): Model name (e.g., “bge-m3”, “nomic-embed-text:latest”)
EmbeddingModelSpec with hf_repo, hf_file, dim, context
Raises: ModelNotFoundError if model name is not in the registry
Example:
ensure_model_available
Ensure a model is downloaded and return its path.name(str): Model nameverbose(bool): Print progress messages
ModelNotFoundError: If model name is unknownModelDownloadError: If download failsInsufficientDiskSpaceError: If not enough disk space
get_model_dimension
Get the embedding dimension for a model.name(str): Model name
ModelNotFoundError if model is not known
Example:
get_model_path
Get the cached path for a model, if it exists.name(str): Model name
LLM GGUF Models
resolve_llm_gguf_model
Resolve an LLM model name to its GGUF specification.name(str): Model name (e.g., “qwen3:30b”, “llama3.2:3b”)
LLMGGUFModelSpec with hf_repo, hf_file, context
Raises: ModelNotFoundError if model name is not in the registry
Example:
ensure_llm_model_available
Ensure an LLM GGUF model is downloaded and return its path.name(str): Model name (e.g., “qwen3:30b”, “llama3.2:3b”)verbose(bool): Print progress messages
ModelNotFoundError: If model name is unknownModelDownloadError: If download fails
get_llm_model_path
Get the cached path for an LLM GGUF model.name(str): Model name
LLM Response Configuration
resolve_llm_model
Resolve an LLM model name to its specification.name(str): Model name (e.g., “qwen3:30b”, “llama3.2”)
LLMModelSpec with response_field and other config
Example:
get_response_field
Get the response field for an LLM model.model_name(str): Model name (e.g., “qwen3:30b”)
supports_thinking
Check if a model supports thinking/reasoning mode.model_name(str): Model name
get_min_max_tokens
Get the minimum max_tokens needed for a model.model_name(str): Model namedefault(int): Default value for models not in registry
Model Families
get_model_family
Detect model family from model name.model_name(str): Model name (e.g., “qwen3:1.7b”, “qwen2.5:3b”)
get_model_prompt_defaults
Get prompt configuration defaults for a model.model_name(str): Model name
Registry Models
Available Embedding Models
- bge-m3 - 1024 dims, 8192 context (~1.2 GB)
- nomic-embed-text - 768 dims, 8192 context (~150 MB)
- e5-mistral-7b - 4096 dims, 4096 context (~4.4 GB)
Available LLM Models
Qwen3 Series:- qwen3:30b - 40960 context (~18 GB)
- qwen3:8b - 40960 context (~5 GB)
- qwen3:4b - 40960 context (~2.7 GB)
- qwen3:1.7b - 40960 context (~1.1 GB)
- qwen3:0.6b - 40960 context (~400 MB)
- llama3.2:3b - 8192 context (~3.4 GB)
- llama3.2:1b - 8192 context (~1.3 GB)
- qwen2.5:3b - 32768 context (~2 GB)
- qwen2.5:1.5b - 32768 context (~1.1 GB)
- smollm2:1.7b - 8192 context (~1 GB)
- phi3:3.8b - 4096 context (~2.3 GB)