Overview
The inference system loads a trained model and preprocessor to make credit risk predictions. It follows a singleton pattern for efficient resource usage.Predictor Class
ThePredictor class (inference/inference.py:20-146) is a singleton that manages model loading and inference.
Architecture
inference/inference.py:20-27
The singleton pattern ensures the model is loaded only once, improving performance and memory efficiency.
Initialization
The Predictor requires three components:Default Predictor Instance
A global predictor instance is created automatically with default paths:inference/inference.py:161-165
Making Predictions Programmatically
Using the Predictor Class Directly
inference/inference.py:174-193
Understanding the Response
The inference method returns a dictionary with two fields:- Good Credit Risk
- Bad Credit Risk
Inference Process Flow
The inference method follows these steps:Convert Input to Dictionary
inference/inference.py:123Create DataFrame
inference/inference.py:126Apply Preprocessing
inference/inference.py:129Convert to Tensor
inference/inference.py:133Model Prediction
inference/inference.py:136-139REST API Inference
Starting the API Server
Launch the FastAPI server for HTTP-based inference:http://localhost:8000
API Endpoint
Endpoint:POST /credit_score_prediction
Reference: server/api.py:58-84
Request Format
Response Format
Input Schema Validation
The API uses Pydantic for strict input validation. All fields are required and type-checked:Field Specifications
| Field | Type | Allowed Values | Example |
|---|---|---|---|
| Age | Integer | > 0 | 35 |
| Sex | String | male, female | "male" |
| Job | String | unskilled and non-resident, unskilled and resident, skilled, highly skilled | "skilled" |
| Housing | String | own, rent, free | "own" |
| Saving accounts | String | NA, little, moderate, quite rich, rich | "NA" |
| Checking account | String | NA, little, moderate, rich | "little" |
| Credit amount | Float | > 0 | 9055.0 |
| Duration | Integer | > 0 | 36 |
| Purpose | String | car, furniture/equipment, radio/TV, domestic appliances, repairs, education, business, vacation/others | "education" |
server/schemas.py:51-96
Example Validation Error
Request with Invalid Age
Response (422)
Testing Inference
Built-in Test Script
The inference module includes a test script:- Initialize the predictor
- Run inference on sample data
- Verify singleton behavior
- Display results
inference/inference.py:167-213
Expected Output
Custom Predictor Instances
You can create custom predictor instances with different models:Due to the singleton pattern, only the first instantiation will actually load the model. Subsequent calls return the existing instance.
Performance Considerations
Model Loading Time
Initial Load
Initial Load
The first prediction includes model loading overhead:
- Loading YAML configuration
- Loading preprocessor (joblib)
- Initializing PyTorch model
- Loading trained weights
Subsequent Predictions
Subsequent Predictions
After initialization, predictions are fast:
- No reloading required (singleton)
- Pure inference time
Batch Predictions
For multiple predictions, the API handles them sequentially. For high-throughput scenarios, consider:Troubleshooting
FileNotFoundError: Model weights not found
FileNotFoundError: Model weights not found
Error:
FileNotFoundError: Model weights file not found: model/model_weights_001.pthSolution:- Ensure you’ve trained a model first
- Check the file path matches your trained model
- Verify the model was saved successfully during training
Shape mismatch error
Shape mismatch error
Error:
RuntimeError: size mismatchSolution:- Ensure the configuration file matches the trained model
- Verify the preprocessor was created during the same training run
- Check that input features match training data
API returns 500 Internal Server Error
API returns 500 Internal Server Error
Possible causes:
- Model files are missing or corrupted
- Preprocessor is incompatible
- Invalid configuration
- Check server logs for detailed error messages
- Verify all three required files exist
- Retrain the model if files are corrupted
Predictions seem incorrect
Predictions seem incorrect
Debugging steps:
- Check which model weights are loaded
- Verify the model’s training performance in MLflow
- Validate input data format and ranges
- Test with known examples from training set
Best Practices
Next Steps
Deployment
Deploy the inference API to production with Docker
API Reference
Explore complete API documentation
Training Models
Train new models with different configurations
MLflow Tracking
Monitor model performance and experiments
