Models API

The models module provides access to the embedded model database and metadata operations.

Core Types

`LlmModel`

Represents a single LLM model with all metadata:

pub struct LlmModel {
    pub name: String,
    pub provider: String,
    pub parameter_count: String,
    pub parameters_raw: Option<u64>,
    pub min_ram_gb: f64,
    pub recommended_ram_gb: f64,
    pub min_vram_gb: Option<f64>,
    pub quantization: String,
    pub context_length: u32,
    pub use_case: String,
    pub is_moe: bool,
    pub num_experts: Option<u32>,
    pub active_experts: Option<u32>,
    pub active_parameters: Option<u64>,
    pub release_date: Option<String>,
    pub gguf_sources: Vec<GgufSource>,
}

Key Fields:

name - Model identifier (e.g., “llama-3.1-8b-instruct”)
provider - Original provider (“Meta”, “Qwen”, etc.)
parameter_count - Human-readable size (“7B”, “8x7B”)
parameters_raw - Exact parameter count
min_ram_gb - Minimum system RAM for CPU inference
recommended_ram_gb - Recommended RAM for best performance
min_vram_gb - Minimum VRAM for GPU inference
quantization - Default quantization level (“Q4_K_M”, “mlx-4bit”)
context_length - Maximum context window
use_case - Primary use case category
is_moe - Whether this is a Mixture-of-Experts model
num_experts / active_experts - MoE expert configuration
active_parameters - Active parameter count for MoE models
release_date - Release date string (ISO 8601)
gguf_sources - Known GGUF download sources

`GgufSource`

A known GGUF download source:

pub struct GgufSource {
    pub repo: String,      // e.g., "bartowski/Llama-3.1-8B-Instruct-GGUF"
    pub provider: String,  // e.g., "bartowski", "unsloth"
}

`ModelDatabase`

Container for the embedded model database:

pub struct ModelDatabase {
    // Private fields
}

Methods:

impl ModelDatabase {
    pub fn new() -> Self;
    pub fn get_all_models(&self) -> &Vec<LlmModel>;
    pub fn find_model(&self, query: &str) -> Vec<&LlmModel>;
    pub fn models_fitting_system(
        &self,
        available_ram_gb: f64,
        has_gpu: bool,
        vram_gb: Option<f64>,
    ) -> Vec<&LlmModel>;
}

`UseCase`

Model use-case categories:

pub enum UseCase {
    General,
    Coding,
    Reasoning,
    Chat,
    Multimodal,
    Embedding,
}

Methods:

impl UseCase {
    pub fn label(&self) -> &'static str;
    pub fn from_model(model: &LlmModel) -> Self;
}

Functions

`ModelDatabase::new()`

Loads the embedded model database:

pub fn new() -> Self

Returns: ModelDatabase with all models loaded Example:

use llmfit_core::ModelDatabase;

let db = ModelDatabase::new();
println!("Loaded {} models", db.get_all_models().len());

The database is embedded at compile time from data/hf_models.json. No runtime file I/O occurs.

`ModelDatabase::get_all_models()`

Returns all models in the database:

pub fn get_all_models(&self) -> &Vec<LlmModel>

Returns: Reference to all models Example:

let db = ModelDatabase::new();

for model in db.get_all_models() {
    println!("{}: {} params, {} ctx",
        model.name,
        model.parameter_count,
        model.context_length
    );
}

`ModelDatabase::find_model()`

Searches models by name, provider, or parameter count:

pub fn find_model(&self, query: &str) -> Vec<&LlmModel>

Parameters:

query - Search term (case-insensitive substring match)

Returns: Matching models Example:

let db = ModelDatabase::new();

// Find all Llama models
let llama_models = db.find_model("llama");

// Find 7B models
let seven_b = db.find_model("7B");

// Find Qwen models
let qwen = db.find_model("qwen");

for model in qwen {
    println!("Found: {}", model.name);
}

`ModelDatabase::models_fitting_system()`

Filters models that fit on specific hardware:

pub fn models_fitting_system(
    &self,
    available_ram_gb: f64,
    has_gpu: bool,
    vram_gb: Option<f64>,
) -> Vec<&LlmModel>

Parameters:

available_ram_gb - Available system RAM
has_gpu - Whether GPU is present
vram_gb - GPU VRAM if available

Returns: Models that meet hardware requirements Example:

use llmfit_core::{SystemSpecs, ModelDatabase};

let specs = SystemSpecs::detect();
let db = ModelDatabase::new();

let fitting = db.models_fitting_system(
    specs.available_ram_gb,
    specs.has_gpu,
    specs.gpu_vram_gb,
);

println!("Models that fit: {}", fitting.len());
for model in fitting.iter().take(5) {
    println!("  - {}", model.name);
}

LlmModel Methods

`is_mlx_model()`

Checks if model is MLX-specific:

pub fn is_mlx_model(&self) -> bool

Returns: true if model name contains “-MLX-” suffix Example:

let model_name = "Qwen3-8B-MLX-4bit";
let is_mlx = model_name.contains("-MLX-");

if is_mlx {
    println!("Apple Silicon only");
}

`params_b()`

Parameter count in billions:

pub fn params_b(&self) -> f64

Returns: Parameter count in billions Example:

let model = /* ... */;
let params = model.params_b();

if params < 10.0 {
    println!("Small model: {:.1}B parameters", params);
} else if params < 100.0 {
    println!("Medium model: {:.1}B parameters", params);
} else {
    println!("Large model: {:.1}B parameters", params);
}

`estimate_memory_gb()`

Estimates memory required for specific quantization and context:

pub fn estimate_memory_gb(&self, quant: &str, ctx: u32) -> f64

Parameters:

quant - Quantization level (“Q4_K_M”, “Q8_0”, etc.)
ctx - Context length in tokens

Returns: Estimated memory in GB Example:

let model = /* ... */;

// 4K context with Q4 quantization
let mem_q4_4k = model.estimate_memory_gb("Q4_K_M", 4096);

// 32K context with Q8 quantization
let mem_q8_32k = model.estimate_memory_gb("Q8_0", 32768);

println!("Q4_K_M @ 4K: {:.2} GB", mem_q4_4k);
println!("Q8_0 @ 32K: {:.2} GB", mem_q8_32k);

`best_quant_for_budget()`

Selects best quantization that fits in memory:

pub fn best_quant_for_budget(
    &self,
    budget_gb: f64,
    ctx: u32,
) -> Option<(&'static str, f64)>

Parameters:

budget_gb - Available memory
ctx - Target context length

Returns: (quantization, estimated_memory) or None if nothing fits Example:

let model = /* ... */;
let budget = 16.0; // 16 GB available

if let Some((quant, mem)) = model.best_quant_for_budget(budget, 4096) {
    println!("Best quantization: {} ({:.2} GB)", quant, mem);
} else {
    println!("Model too large for available memory");
}

The function tries quantization levels in quality order:

Q8_0 (best quality)
Q6_K
Q5_K_M
Q4_K_M
Q3_K_M
Q2_K (smallest)

If nothing fits, it tries halving the context length once.

MoE-Specific Methods

For Mixture-of-Experts models:

// Active expert VRAM (GPU)
pub fn moe_active_vram_gb(&self) -> Option<f64>;

// Inactive expert RAM (offloaded to system RAM)
pub fn moe_offloaded_ram_gb(&self) -> Option<f64>;

Example:

let model = /* MoE model like Mixtral 8x7B */;

if model.is_moe {
    if let Some(active_vram) = model.moe_active_vram_gb() {
        println!("Active experts: {:.2} GB VRAM", active_vram);
    }
    if let Some(offloaded) = model.moe_offloaded_ram_gb() {
        println!("Inactive experts: {:.2} GB RAM", offloaded);
    }
}

Quantization Functions

`quant_bpp()`

Bytes per parameter for quantization level:

pub fn quant_bpp(quant: &str) -> f64

Example:

use llmfit_core::models::quant_bpp;

assert_eq!(quant_bpp("F16"), 2.0);
assert_eq!(quant_bpp("Q8_0"), 1.05);
assert_eq!(quant_bpp("Q4_K_M"), 0.58);
assert_eq!(quant_bpp("mlx-4bit"), 0.55);

`quant_speed_multiplier()`

Speed impact of quantization:

pub fn quant_speed_multiplier(quant: &str) -> f64

Higher values = faster inference (lower precision = faster math).

`quant_quality_penalty()`

Quality penalty for quantization:

pub fn quant_quality_penalty(quant: &str) -> f64

Negative values indicate quality loss relative to F16.

Quantization Hierarchies

Predefined quantization hierarchies (best to worst quality):

// Standard GGUF hierarchy
pub const QUANT_HIERARCHY: &[&str] = &[
    "Q8_0", "Q6_K", "Q5_K_M", "Q4_K_M", "Q3_K_M", "Q2_K"
];

// MLX-native hierarchy
pub const MLX_QUANT_HIERARCHY: &[&str] = &[
    "mlx-8bit", "mlx-4bit"
];

Example:

use llmfit_core::models::{QUANT_HIERARCHY, MLX_QUANT_HIERARCHY};

let model = /* ... */;
let budget = 12.0;

// Try GGUF quantizations
if let Some((q, mem)) = model.best_quant_for_budget_with(
    budget, 4096, QUANT_HIERARCHY
) {
    println!("GGUF: {} ({:.2} GB)", q, mem);
}

// Try MLX quantizations
if let Some((q, mem)) = model.best_quant_for_budget_with(
    budget, 4096, MLX_QUANT_HIERARCHY
) {
    println!("MLX: {} ({:.2} GB)", q, mem);
}

Use Case Inference

use llmfit_core::{UseCase, ModelDatabase};

let db = ModelDatabase::new();

for model in db.get_all_models() {
    let use_case = UseCase::from_model(model);
    println!("{}: {}", model.name, use_case.label());
}

Use cases are inferred from model name and metadata:

Embedding: “embed”, “bge” in name
Coding: “code” in name or use_case
Multimodal: “vision” in use_case
Reasoning: “reason” or “deepseek-r1” in name
Chat: “chat” or “instruction” in use_case
General: Default fallback

CLI Commands

REST API

Core Library

Models

Models API

Core Types

`LlmModel`

`GgufSource`

`ModelDatabase`

`UseCase`

Functions

`ModelDatabase::new()`

`ModelDatabase::get_all_models()`

`ModelDatabase::find_model()`

`ModelDatabase::models_fitting_system()`

LlmModel Methods

`is_mlx_model()`

`params_b()`

`estimate_memory_gb()`

`best_quant_for_budget()`

MoE-Specific Methods

Quantization Functions

`quant_bpp()`

`quant_speed_multiplier()`

`quant_quality_penalty()`

Quantization Hierarchies

Use Case Inference

Build docs developers (and LLMs) love

CLI Commands

REST API

Core Library

​Models API

​Core Types

​LlmModel

​GgufSource

​ModelDatabase

​UseCase

​Functions

​ModelDatabase::new()

​ModelDatabase::get_all_models()

​ModelDatabase::find_model()

​ModelDatabase::models_fitting_system()

​LlmModel Methods

​is_mlx_model()

​params_b()

​estimate_memory_gb()

​best_quant_for_budget()

​MoE-Specific Methods

​Quantization Functions

​quant_bpp()

​quant_speed_multiplier()

​quant_quality_penalty()

​Quantization Hierarchies

​Use Case Inference

Build docs developers (and LLMs) love

Models API

Core Types

`LlmModel`

`GgufSource`

`ModelDatabase`

`UseCase`

Functions

`ModelDatabase::new()`

`ModelDatabase::get_all_models()`

`ModelDatabase::find_model()`

`ModelDatabase::models_fitting_system()`

LlmModel Methods

`is_mlx_model()`

`params_b()`

`estimate_memory_gb()`

`best_quant_for_budget()`

MoE-Specific Methods

Quantization Functions

`quant_bpp()`

`quant_speed_multiplier()`

`quant_quality_penalty()`

Quantization Hierarchies

Use Case Inference