Triage Models

The triage system uses machine learning to automatically classify support tickets into categories and priority levels. This enables intelligent routing and context-aware answer generation.

Architecture

Two separate logistic regression models handle classification:

Category Model - Predicts support domain (Billing, Authentication, etc.)
Priority Model - Predicts urgency level (Low, Medium, High)

Both models use TF-IDF features extracted from ticket subject and body text.

The models are trained offline and loaded once at service initialization for fast inference.

Model Training

Training happens in src/ml/train.py using a supervised learning pipeline.

Pipeline Construction

def build_pipeline() -> Pipeline:
    """
    Construct a text classification pipeline.
    """
    return Pipeline(
        [
            ("features", build_feature_union()),
            ("clf", LogisticRegression(max_iter=500, class_weight="balanced")),
        ]
    )

The class_weight="balanced" parameter ensures the model handles class imbalance effectively.

Data Preprocessing

Text is normalized before feature extraction:

def load_dataset(
    train_csv: str,
    df_override: Optional[pd.DataFrame] = None,
) -> pd.DataFrame:
    """
    Load the training dataset from disk or use an injected DataFrame.
    """
    df = df_override.copy() if df_override is not None else pd.read_csv(train_csv)

    # Text normalization
    df["subject"] = df["subject"].apply(preprocess_text)
    df["body"] = df["body"].apply(preprocess_text)

    return df

Train/Validation Split

Stratified splitting ensures balanced representation:

def split_train_val(
    X: pd.DataFrame,
    y_category: pd.Series,
    y_priority: pd.Series,
    test_size: float = 0.2,
    random_state: int = 42,
) -> Tuple:
    """
    Perform a train/validation split with optional stratification.
    """
    stratify = y_category if y_category.value_counts().min() >= 2 else None

    if stratify is None:
        warnings.warn(
            "Dataset too small for stratified split; using non-stratified split."
        )

    return train_test_split(
        X,
        y_category,
        y_priority,
        test_size=test_size,
        random_state=random_state,
        stratify=stratify,
    )

Edge Case Handling

The training pipeline handles single-class datasets gracefully:

class ConstantPredictor:
    """
    Fallback predictor used when the training data contains only one class.
    """

    def __init__(self, label):
        self.label = label

    def predict(self, X):
        return [self.label] * len(X)

    def predict_proba(self, X):
        return np.ones((len(X), 1))


def train_or_fallback(pipeline: Pipeline, X, y):
    """
    Train a pipeline or fall back to a constant predictor if only one class exists.
    """
    if y.nunique() >= 2:
        pipeline.fit(X, y)
        return pipeline

    return ConstantPredictor(y.iloc[0])

Constant predictors are used automatically when training data lacks class diversity.

Inference

The TriageModel class handles runtime predictions:

class TriageModel:
    """
    ML model for triaging support tickets:
      - predicts category and priority
      - returns confidence scores
    """

    def __init__(
        self,
        category_model_path: str = "artifacts/category_model.joblib",
        priority_model_path: str = "artifacts/priority_model.joblib",
    ):
        """
        Load pre-trained ML models from disk.
        """
        self.category_model_path = Path(category_model_path)
        self.priority_model_path = Path(priority_model_path)

        # Load models once during initialization
        self.category_model = self._load_model(self.category_model_path)
        self.priority_model = self._load_model(self.priority_model_path)

    @staticmethod
    def _load_model(path: Path):
        if not path.exists():
            raise FileNotFoundError(f"ML model not found: {path}")
        return joblib.load(path)

Prediction with Confidence

def predict(self, subject: str, body: str) -> Dict:
    """
    Predict category and priority from ticket subject and body.

    Returns:
        Dict with:
            - category: predicted category
            - priority: predicted priority
            - confidence: dict with category & priority probabilities
    """
    # Preprocess inputs
    subject_clean = preprocess_text(subject)
    body_clean = preprocess_text(body)

    X = pd.DataFrame([{"subject": subject_clean, "body": body_clean}])

    # Predict labels
    category = self.category_model.predict(X)[0]
    priority = self.priority_model.predict(X)[0]

    # Predict confidence scores
    cat_conf = max(self.category_model.predict_proba(X)[0])
    pri_conf = max(self.priority_model.predict_proba(X)[0])

    return {
        "category": category,
        "priority": priority,
        "confidence": {"category": float(cat_conf), "priority": float(pri_conf)},
    }

Confidence scores are derived from the maximum probability across all classes.

Evaluation Metrics

The training script computes standard classification metrics:

def compute_metrics(
    y_true_cat,
    y_pred_cat,
    y_true_pri,
    y_pred_pri,
) -> Dict[str, float]:
    """
    Compute validation metrics for both tasks.
    """
    return {
        "category_macro_f1": float(f1_score(y_true_cat, y_pred_cat, average="macro")),
        "priority_f1": float(f1_score(y_true_pri, y_pred_pri, average="weighted")),
        "priority_recall": float(
            recall_score(y_true_pri, y_pred_pri, average="weighted")
        ),
    }

Visualization

Confusion matrices are automatically generated for both models:

def plot_confusion_matrix(
    y_true,
    y_pred,
    labels,
    title: str,
    cmap: str,
    save_path: str,
):
    """
    Plot and save a confusion matrix.
    """
    cm = confusion_matrix(y_true, y_pred, labels=labels)

    plt.figure(figsize=(8, 6))
    sns.heatmap(
        cm,
        annot=True,
        fmt="d",
        cmap=cmap,
        xticklabels=labels,
        yticklabels=labels,
    )
    plt.xlabel("Predicted")
    plt.ylabel("True")
    plt.title(title)
    plt.savefig(save_path, bbox_inches="tight")
    plt.close()

Model Artifacts

Trained models are saved to artifacts/ as .joblib files for fast loading.

Confidence Thresholds

The RAG pipeline uses confidence scores to flag uncertain predictions:

CATEGORY_CONF_THRESHOLD = 0.5
PRIORITY_CONF_THRESHOLD = 0.5

needs_human_review = (
    confidence.get("category", 0) < CATEGORY_CONF_THRESHOLD
    or confidence.get("priority", 0) < PRIORITY_CONF_THRESHOLD
)

Tickets with low confidence scores are automatically flagged for human review.

RAG Pipeline

See how triage predictions guide retrieval

Structured Outputs

Understand review flags and next steps

Getting Started

Core Concepts

Guides

Deployment

Architecture

Model Training

Pipeline Construction

Data Preprocessing

Train/Validation Split

Edge Case Handling

Inference

Prediction with Confidence

Evaluation Metrics

Visualization

Model Artifacts

Confidence Thresholds

RAG Pipeline

Structured Outputs

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

Deployment

​Architecture

​Model Training

​Pipeline Construction

​Data Preprocessing

​Train/Validation Split

​Edge Case Handling

​Inference

​Prediction with Confidence

​Evaluation Metrics

​Visualization

Model Artifacts

​Confidence Thresholds

​Related Concepts

RAG Pipeline

Structured Outputs

Build docs developers (and LLMs) love

Architecture

Model Training

Pipeline Construction

Data Preprocessing

Train/Validation Split

Edge Case Handling

Inference

Prediction with Confidence

Evaluation Metrics

Visualization

Confidence Thresholds

Related Concepts