Skip to main content
The Models API provides model resolution and management for embedding and LLM models used in Watercooler.

Overview

Watercooler supports:
  • Embedding models - GGUF files for llama.cpp (bge-m3, nomic-embed-text, e5-mistral-7b)
  • LLM models - GGUF files for llama-server with response field configuration (Qwen3, Llama 3.2, SmolLM2, Qwen2.5, Phi-3)

Embedding Models

resolve_embedding_model

Resolve a friendly model name to its full specification.
def resolve_embedding_model(name: str) -> EmbeddingModelSpec
Parameters:
  • name (str): Model name (e.g., “bge-m3”, “nomic-embed-text:latest”)
Returns: EmbeddingModelSpec with hf_repo, hf_file, dim, context Raises: ModelNotFoundError if model name is not in the registry Example:
from watercooler.models import resolve_embedding_model

spec = resolve_embedding_model("bge-m3")
print(spec["dim"])  # 1024
print(spec["hf_repo"])  # KimChen/bge-m3-GGUF

ensure_model_available

Ensure a model is downloaded and return its path.
def ensure_model_available(
    name: str,
    verbose: bool = True,
) -> Path
Parameters:
  • name (str): Model name
  • verbose (bool): Print progress messages
Returns: Path to the model file Raises:
  • ModelNotFoundError: If model name is unknown
  • ModelDownloadError: If download fails
  • InsufficientDiskSpaceError: If not enough disk space
Example:
from watercooler.models import ensure_model_available

model_path = ensure_model_available("bge-m3")
print(f"Model available at: {model_path}")

get_model_dimension

Get the embedding dimension for a model.
def get_model_dimension(name: str) -> int
Parameters:
  • name (str): Model name
Returns: Embedding dimension (e.g., 1024 for bge-m3) Raises: ModelNotFoundError if model is not known Example:
from watercooler.models import get_model_dimension

dim = get_model_dimension("bge-m3")
print(f"Embedding dimension: {dim}")  # 1024

get_model_path

Get the cached path for a model, if it exists.
def get_model_path(name: str) -> Optional[Path]
Parameters:
  • name (str): Model name
Returns: Path to the cached model file, or None if not downloaded Example:
from watercooler.models import get_model_path

path = get_model_path("bge-m3")
if path:
    print(f"Model cached at: {path}")
else:
    print("Model not downloaded yet")

LLM GGUF Models

resolve_llm_gguf_model

Resolve an LLM model name to its GGUF specification.
def resolve_llm_gguf_model(name: str) -> LLMGGUFModelSpec
Parameters:
  • name (str): Model name (e.g., “qwen3:30b”, “llama3.2:3b”)
Returns: LLMGGUFModelSpec with hf_repo, hf_file, context Raises: ModelNotFoundError if model name is not in the registry Example:
from watercooler.models import resolve_llm_gguf_model

spec = resolve_llm_gguf_model("qwen3:30b")
print(spec["context"])  # 40960
print(spec["hf_file"])  # Qwen3-30B-A3B-Q4_K_M.gguf

ensure_llm_model_available

Ensure an LLM GGUF model is downloaded and return its path.
def ensure_llm_model_available(
    name: str,
    verbose: bool = True,
) -> Path
Parameters:
  • name (str): Model name (e.g., “qwen3:30b”, “llama3.2:3b”)
  • verbose (bool): Print progress messages
Returns: Path to the model file Raises:
  • ModelNotFoundError: If model name is unknown
  • ModelDownloadError: If download fails
Example:
from watercooler.models import ensure_llm_model_available

model_path = ensure_llm_model_available("qwen3:1.7b")
print(f"LLM model available at: {model_path}")

get_llm_model_path

Get the cached path for an LLM GGUF model.
def get_llm_model_path(name: str) -> Optional[Path]
Parameters:
  • name (str): Model name
Returns: Path to the cached model file, or None if not downloaded Example:
from watercooler.models import get_llm_model_path

path = get_llm_model_path("qwen3:1.7b")
if path:
    print(f"LLM model cached at: {path}")

LLM Response Configuration

resolve_llm_model

Resolve an LLM model name to its specification.
def resolve_llm_model(name: str) -> LLMModelSpec
Parameters:
  • name (str): Model name (e.g., “qwen3:30b”, “llama3.2”)
Returns: LLMModelSpec with response_field and other config Example:
from watercooler.models import resolve_llm_model

spec = resolve_llm_model("qwen3:30b")
print(spec["response_field"])  # "content"
print(spec["supports_thinking"])  # True

get_response_field

Get the response field for an LLM model.
def get_response_field(model_name: str) -> str
Parameters:
  • model_name (str): Model name (e.g., “qwen3:30b”)
Returns: Field name to extract response from: “content” or “reasoning” Example:
from watercooler.models import get_response_field

field = get_response_field("qwen3:30b")
print(field)  # "content"

supports_thinking

Check if a model supports thinking/reasoning mode.
def supports_thinking(model_name: str) -> bool
Parameters:
  • model_name (str): Model name
Returns: True if model uses thinking mode Example:
from watercooler.models import supports_thinking

if supports_thinking("qwen3:30b"):
    print("Model supports thinking mode")

get_min_max_tokens

Get the minimum max_tokens needed for a model.
def get_min_max_tokens(model_name: str, default: int = 256) -> int
Parameters:
  • model_name (str): Model name
  • default (int): Default value for models not in registry
Returns: Minimum max_tokens value Example:
from watercooler.models import get_min_max_tokens

min_tokens = get_min_max_tokens("qwen3:30b")
print(f"Minimum tokens needed: {min_tokens}")  # 512

Model Families

get_model_family

Detect model family from model name.
def get_model_family(model_name: str) -> str
Parameters:
  • model_name (str): Model name (e.g., “qwen3:1.7b”, “qwen2.5:3b”)
Returns: Model family identifier (e.g., “qwen3”, “qwen2.5”, “default”) Example:
from watercooler.models import get_model_family

family = get_model_family("qwen3:1.7b")
print(family)  # "qwen3"

get_model_prompt_defaults

Get prompt configuration defaults for a model.
def get_model_prompt_defaults(model_name: str) -> dict[str, str]
Parameters:
  • model_name (str): Model name
Returns: Dict with “system_prompt” and “prompt_prefix” keys Example:
from watercooler.models import get_model_prompt_defaults

defaults = get_model_prompt_defaults("qwen3:1.7b")
print(defaults["prompt_prefix"])  # "/no_think "

Registry Models

Available Embedding Models

  • bge-m3 - 1024 dims, 8192 context (~1.2 GB)
  • nomic-embed-text - 768 dims, 8192 context (~150 MB)
  • e5-mistral-7b - 4096 dims, 4096 context (~4.4 GB)

Available LLM Models

Qwen3 Series:
  • qwen3:30b - 40960 context (~18 GB)
  • qwen3:8b - 40960 context (~5 GB)
  • qwen3:4b - 40960 context (~2.7 GB)
  • qwen3:1.7b - 40960 context (~1.1 GB)
  • qwen3:0.6b - 40960 context (~400 MB)
Llama 3.2 Series:
  • llama3.2:3b - 8192 context (~3.4 GB)
  • llama3.2:1b - 8192 context (~1.3 GB)
Qwen2.5 Series:
  • qwen2.5:3b - 32768 context (~2 GB)
  • qwen2.5:1.5b - 32768 context (~1.1 GB)
SmolLM2:
  • smollm2:1.7b - 8192 context (~1 GB)
Phi-3:
  • phi3:3.8b - 4096 context (~2.3 GB)

Auto-Provisioning

Model downloads are controlled by configuration:
from watercooler.models import is_model_auto_provision_enabled

if is_model_auto_provision_enabled():
    print("Auto-download enabled")
Set via environment variable:
export WATERCOOLER_AUTO_PROVISION_MODELS=true
Or in config.toml:
[mcp.service_provision]
models = true

Type Definitions

EmbeddingModelSpec

class EmbeddingModelSpec(TypedDict, total=False):
    hf_repo: str      # HuggingFace repository ID
    hf_file: str      # Filename within the repo
    dim: int          # Embedding dimension
    context: int      # Context window size
    size_mb: int      # Approximate file size in MB

LLMGGUFModelSpec

class LLMGGUFModelSpec(TypedDict, total=False):
    hf_repo: str      # HuggingFace repository ID
    hf_file: str      # Filename within the repo
    context: int      # Context window size
    size_mb: int      # Approximate file size in MB

LLMModelSpec

class LLMModelSpec(TypedDict, total=False):
    response_field: str           # Field containing response
    supports_thinking: bool       # Whether model uses thinking mode
    min_max_tokens: int          # Minimum max_tokens needed
    default_temperature: float   # Suggested temperature

Exceptions

class ModelNotFoundError(Exception):
    """Raised when a model name cannot be resolved."""
    pass

class ModelDownloadError(Exception):
    """Raised when a model cannot be downloaded."""
    pass

class InsufficientDiskSpaceError(ModelDownloadError):
    """Raised when there isn't enough disk space for download."""
    pass

Build docs developers (and LLMs) love