Prerequisites
- Trained models in
artifacts/directory (see Training Models) - Test data:
tickets_test.csvin the project root - Required columns:
subject,body
Prediction Command
Prepare test data
Create or verify
tickets_test.csv with at minimum these columns:subject: Ticket subject linebody: Ticket description or body text
Run predictions
- Load trained models from
artifacts/ - Preprocess test data
- Run category and priority predictions
- Compute confidence scores
- Save results to CSV
Output Format
The prediction script generates a CSV file with the original test data plus four additional columns:| Column | Description |
|---|---|
predicted_category | Predicted support category (e.g., “billing”, “technical”) |
category_confidence | Confidence score (0.0 to 1.0) for category prediction |
predicted_priority | Predicted priority level (e.g., “low”, “medium”, “high”) |
priority_confidence | Confidence score (0.0 to 1.0) for priority prediction |
Example Output
How Predictions Work
The prediction pipeline executes the following steps (seesrc/ml/predict.py:125):
- Load Models: Loads trained pipelines from
artifacts/directory - Read Test Data: Loads
tickets_test.csvinto a DataFrame - Preprocess: Normalizes text (lowercase, strip whitespace, fill NaN values)
- Feature Extraction: Applies TF-IDF transformations via trained pipelines
- Predict: Generates category and priority predictions
- Confidence Scores: Computes max probability from
predict_proba - Export: Saves augmented DataFrame to
reports/predictions.csv
Confidence Scores
Confidence scores indicate model certainty (seesrc/ml/predict.py:73):
- High confidence (> 0.8): Model is very certain
- Medium confidence (0.5 - 0.8): Moderate certainty
- Low confidence (< 0.5): Uncertain prediction, may require manual review
ConstantPredictor (single-class fallback), confidence is always 1.0.
Custom Prediction Paths
You can customize input/output paths programmatically:Using Predictions in API
The trained models are automatically loaded by the/triage API endpoint:
Preprocessing Behavior
The prediction script applies lightweight preprocessing (seesrc/ml/predict.py:46):
- Convert text to lowercase
- Strip leading/trailing whitespace
- Fill NaN values with empty strings
Troubleshooting
FileNotFoundError: Model file not found
FileNotFoundError: Model file not found
Train models first using:
KeyError: 'subject' or 'body' column missing
KeyError: 'subject' or 'body' column missing
Ensure your test CSV has both
subject and body columns. Other columns are optional.Low confidence scores across all predictions
Low confidence scores across all predictions
This may indicate:
- Test data distribution differs from training data
- Models need retraining with more diverse examples
- Feature vocabulary mismatch
All predictions return the same value
All predictions return the same value
This occurs when models were trained on single-class data and fell back to
ConstantPredictor.