Skip to main content
This guide covers how to use trained models for making predictions, both programmatically through the Predictor class and via the REST API.

Overview

The inference system loads a trained model and preprocessor to make credit risk predictions. It follows a singleton pattern for efficient resource usage.

Predictor Class

The Predictor class (inference/inference.py:20-146) is a singleton that manages model loading and inference.

Architecture

class Predictor:
    """
    Singleton class for model inference.
    Loads the model weights, configuration, and preprocessor.
    """
    _instance = None
    _initialized = False
Reference: inference/inference.py:20-27
The singleton pattern ensures the model is loaded only once, improving performance and memory efficiency.

Initialization

The Predictor requires three components:
1

Model Weights (.pth)

PyTorch state dictionary containing trained weights:
model_path = "model/model_weights_001.pth"
2

Configuration File (.yaml)

YAML file defining the model architecture:
config_path = "config/models-configs/model_config_001.yaml"
3

Preprocessor (.joblib)

Scikit-learn pipeline for data transformation:
preprocessor_path = "processing/preprocessor.joblib"

Default Predictor Instance

A global predictor instance is created automatically with default paths:
predictor = Predictor(
    model_path=DEFAULT_MODEL_PATH,
    config_path=DEFAULT_CONFIG_PATH,
    preprocessor_path=DEFAULT_PREPROCESSOR_PATH,
)
Reference: inference/inference.py:161-165

Making Predictions Programmatically

Using the Predictor Class Directly

from inference.inference import predictor
from server.schemas import CreditRiskInput

# Create input data
sample_input = CreditRiskInput(
    Age=35,
    Sex="male",
    Job="skilled",
    Housing="own",
    Saving_accounts="NA",
    Checking_account="little",
    Credit_amount=9055.0,
    Duration=36,
    Purpose="education",
)

# Run inference
result = predictor.inference(sample_input)
print(result)
# Output: {'prediction': 'good', 'probability': 0.8234}
Reference: inference/inference.py:174-193

Understanding the Response

The inference method returns a dictionary with two fields:
{
    "prediction": "good",    # Binary classification: "good" or "bad"
    "probability": 0.8234   # Probability of "good" credit risk (0-1)
}
{
  "prediction": "good",
  "probability": 0.85
}
High probability (> 0.5) indicates the customer is likely to repay the credit.

Inference Process Flow

The inference method follows these steps:
1

Convert Input to Dictionary

input_dict = data.model_dump(by_alias=True)
Converts Pydantic model to dictionary using field aliases.Reference: inference/inference.py:123
2

Create DataFrame

df = pd.DataFrame([input_dict])
Wraps input in a pandas DataFrame for preprocessing.Reference: inference/inference.py:126
3

Apply Preprocessing

processed_data = self.preprocessor.transform(df)
Applies the same transformations used during training.Reference: inference/inference.py:129
4

Convert to Tensor

tensor_data = torch.FloatTensor(processed_data).to(device)
Converts to PyTorch tensor on appropriate device (CPU/GPU).Reference: inference/inference.py:133
5

Model Prediction

with torch.no_grad():
    probs = self.model.predict_probability(tensor_data)
    good_prob = probs[0][1].item()
Runs forward pass without gradient computation.Reference: inference/inference.py:136-139
6

Binary Classification

prediction_idx = torch.argmax(probs, dim=1).item()
prediction = "good" if prediction_idx == 1 else "bad"
Converts probability to binary prediction.Reference: inference/inference.py:142-143

REST API Inference

Starting the API Server

Launch the FastAPI server for HTTP-based inference:
uv run uvicorn server.api:app --reload --port 8000
The API will be available at http://localhost:8000
Access interactive API documentation at http://localhost:8000/docs

API Endpoint

Endpoint: POST /credit_score_prediction Reference: server/api.py:58-84

Request Format

{
  "Age": 35,
  "Sex": "male",
  "Job": "skilled",
  "Housing": "own",
  "Saving accounts": "NA",
  "Checking account": "little",
  "Credit amount": 9055.0,
  "Duration": 36,
  "Purpose": "education"
}

Response Format

{
  "prediction": "good",
  "probability": 0.8234
}

Input Schema Validation

The API uses Pydantic for strict input validation. All fields are required and type-checked:

Field Specifications

FieldTypeAllowed ValuesExample
AgeInteger> 035
SexStringmale, female"male"
JobStringunskilled and non-resident, unskilled and resident, skilled, highly skilled"skilled"
HousingStringown, rent, free"own"
Saving accountsStringNA, little, moderate, quite rich, rich"NA"
Checking accountStringNA, little, moderate, rich"little"
Credit amountFloat> 09055.0
DurationInteger> 036
PurposeStringcar, furniture/equipment, radio/TV, domestic appliances, repairs, education, business, vacation/others"education"
Reference: server/schemas.py:51-96
Invalid values will result in a 422 Unprocessable Entity error with detailed validation messages.

Example Validation Error

Request with Invalid Age
{
  "Age": -5,  // Invalid: must be > 0
  "Sex": "male",
  ...
}
Response (422)
{
  "detail": [
    {
      "type": "greater_than",
      "loc": ["body", "Age"],
      "msg": "Input should be greater than 0",
      "input": -5
    }
  ]
}

Testing Inference

Built-in Test Script

The inference module includes a test script:
uv run python -m inference.inference
This will:
  1. Initialize the predictor
  2. Run inference on sample data
  3. Verify singleton behavior
  4. Display results
Reference: inference/inference.py:167-213

Expected Output

==================================================
TESTING CREDIT SCORE PREDICTOR
==================================================

[1] Input data sample:
{
    "Age": 35,
    "Sex": "male",
    "Job": "skilled",
    ...
}

[2] Inference result:
{'prediction': 'good', 'probability': 0.8234}

[3] Singleton Verification: True

SUCCESS: Predictor class and inference function are working correctly!

Custom Predictor Instances

You can create custom predictor instances with different models:
from inference.inference import Predictor

# Load a different model
custom_predictor = Predictor(
    model_path="model/model_weights_002.pth",
    config_path="config/models-configs/model_config_002.yaml",
    preprocessor_path="processing/preprocessor.joblib",
)

result = custom_predictor.inference(input_data)
Due to the singleton pattern, only the first instantiation will actually load the model. Subsequent calls return the existing instance.

Performance Considerations

Model Loading Time

The first prediction includes model loading overhead:
  • Loading YAML configuration
  • Loading preprocessor (joblib)
  • Initializing PyTorch model
  • Loading trained weights
Typical time: 1-3 seconds
After initialization, predictions are fast:
  • No reloading required (singleton)
  • Pure inference time
Typical time: 5-20 milliseconds

Batch Predictions

For multiple predictions, the API handles them sequentially. For high-throughput scenarios, consider:
# Process multiple inputs
inputs = [input1, input2, input3]
results = [predictor.inference(inp) for inp in inputs]
For production deployments, use multiple workers with Gunicorn or deploy behind a load balancer.

Troubleshooting

Error: FileNotFoundError: Model weights file not found: model/model_weights_001.pthSolution:
  • Ensure you’ve trained a model first
  • Check the file path matches your trained model
  • Verify the model was saved successfully during training
Error: RuntimeError: size mismatchSolution:
  • Ensure the configuration file matches the trained model
  • Verify the preprocessor was created during the same training run
  • Check that input features match training data
Possible causes:
  • Model files are missing or corrupted
  • Preprocessor is incompatible
  • Invalid configuration
Solution:
  • Check server logs for detailed error messages
  • Verify all three required files exist
  • Retrain the model if files are corrupted
Debugging steps:
  1. Check which model weights are loaded
  2. Verify the model’s training performance in MLflow
  3. Validate input data format and ranges
  4. Test with known examples from training set

Best Practices

1

Validate Inputs

Always use the Pydantic schemas to ensure data quality:
from server.schemas import CreditRiskInput

# This will raise validation errors for invalid data
validated_input = CreditRiskInput(**raw_data)
2

Handle Errors Gracefully

Wrap inference calls in try-except blocks:
try:
    result = predictor.inference(input_data)
except Exception as e:
    logger.error(f"Inference failed: {e}")
    # Handle error appropriately
3

Monitor Performance

Track inference latency and throughput in production:
import time

start = time.time()
result = predictor.inference(input_data)
latency = time.time() - start

logger.info(f"Inference completed in {latency:.3f}s")
4

Version Your Models

Keep track of which model version is deployed:
# Use environment variables or configuration
MODEL_VERSION = os.getenv("MODEL_VERSION", "001")
model_path = f"model/model_weights_{MODEL_VERSION}.pth"

Next Steps

Deployment

Deploy the inference API to production with Docker

API Reference

Explore complete API documentation

Training Models

Train new models with different configurations

MLflow Tracking

Monitor model performance and experiments

Build docs developers (and LLMs) love