Running Inference

This guide covers how to use trained models for making predictions, both programmatically through the Predictor class and via the REST API.

Overview

The inference system loads a trained model and preprocessor to make credit risk predictions. It follows a singleton pattern for efficient resource usage.

Predictor Class

The Predictor class (inference/inference.py:20-146) is a singleton that manages model loading and inference.

Architecture

class Predictor:
    """
    Singleton class for model inference.
    Loads the model weights, configuration, and preprocessor.
    """
    _instance = None
    _initialized = False

Reference: inference/inference.py:20-27

The singleton pattern ensures the model is loaded only once, improving performance and memory efficiency.

Initialization

The Predictor requires three components:

Model Weights (.pth)

PyTorch state dictionary containing trained weights:

model_path = "model/model_weights_001.pth"

Configuration File (.yaml)

YAML file defining the model architecture:

config_path = "config/models-configs/model_config_001.yaml"

Preprocessor (.joblib)

Scikit-learn pipeline for data transformation:

preprocessor_path = "processing/preprocessor.joblib"

Default Predictor Instance

A global predictor instance is created automatically with default paths:

predictor = Predictor(
    model_path=DEFAULT_MODEL_PATH,
    config_path=DEFAULT_CONFIG_PATH,
    preprocessor_path=DEFAULT_PREPROCESSOR_PATH,
)

Reference: inference/inference.py:161-165

Making Predictions Programmatically

Using the Predictor Class Directly

from inference.inference import predictor
from server.schemas import CreditRiskInput

# Create input data
sample_input = CreditRiskInput(
    Age=35,
    Sex="male",
    Job="skilled",
    Housing="own",
    Saving_accounts="NA",
    Checking_account="little",
    Credit_amount=9055.0,
    Duration=36,
    Purpose="education",
)

# Run inference
result = predictor.inference(sample_input)
print(result)
# Output: {'prediction': 'good', 'probability': 0.8234}

Reference: inference/inference.py:174-193

Understanding the Response

The inference method returns a dictionary with two fields:

{
    "prediction": "good",    # Binary classification: "good" or "bad"
    "probability": 0.8234   # Probability of "good" credit risk (0-1)
}

Good Credit Risk
Bad Credit Risk

{
  "prediction": "good",
  "probability": 0.85
}

High probability (> 0.5) indicates the customer is likely to repay the credit.

{
  "prediction": "bad",
  "probability": 0.32
}

Low probability (< 0.5) indicates higher risk of default.

Inference Process Flow

The inference method follows these steps:

Convert Input to Dictionary

input_dict = data.model_dump(by_alias=True)

Converts Pydantic model to dictionary using field aliases.Reference: inference/inference.py:123

Create DataFrame

df = pd.DataFrame([input_dict])

Wraps input in a pandas DataFrame for preprocessing.Reference: inference/inference.py:126

Apply Preprocessing

processed_data = self.preprocessor.transform(df)

Applies the same transformations used during training.Reference: inference/inference.py:129

Convert to Tensor

tensor_data = torch.FloatTensor(processed_data).to(device)

Converts to PyTorch tensor on appropriate device (CPU/GPU).Reference: inference/inference.py:133

Model Prediction

with torch.no_grad():
    probs = self.model.predict_probability(tensor_data)
    good_prob = probs[0][1].item()

Runs forward pass without gradient computation.Reference: inference/inference.py:136-139

Binary Classification

prediction_idx = torch.argmax(probs, dim=1).item()
prediction = "good" if prediction_idx == 1 else "bad"

Converts probability to binary prediction.Reference: inference/inference.py:142-143

REST API Inference

Starting the API Server

Launch the FastAPI server for HTTP-based inference:

uv run uvicorn server.api:app --reload --port 8000

The API will be available at http://localhost:8000

Access interactive API documentation at http://localhost:8000/docs

API Endpoint

Endpoint: POST /credit_score_prediction Reference: server/api.py:58-84

Request Format

{
  "Age": 35,
  "Sex": "male",
  "Job": "skilled",
  "Housing": "own",
  "Saving accounts": "NA",
  "Checking account": "little",
  "Credit amount": 9055.0,
  "Duration": 36,
  "Purpose": "education"
}

Response Format

{
  "prediction": "good",
  "probability": 0.8234
}

Input Schema Validation

The API uses Pydantic for strict input validation. All fields are required and type-checked:

Field Specifications

Field	Type	Allowed Values	Example
Age	Integer	> 0	`35`
Sex	String	`male`, `female`	`"male"`
Job	String	`unskilled and non-resident`, `unskilled and resident`, `skilled`, `highly skilled`	`"skilled"`
Housing	String	`own`, `rent`, `free`	`"own"`
Saving accounts	String	`NA`, `little`, `moderate`, `quite rich`, `rich`	`"NA"`
Checking account	String	`NA`, `little`, `moderate`, `rich`	`"little"`
Credit amount	Float	> 0	`9055.0`
Duration	Integer	> 0	`36`
Purpose	String	`car`, `furniture/equipment`, `radio/TV`, `domestic appliances`, `repairs`, `education`, `business`, `vacation/others`	`"education"`

Reference: server/schemas.py:51-96

Invalid values will result in a 422 Unprocessable Entity error with detailed validation messages.

Example Validation Error

Request with Invalid Age

{
  "Age": -5,  // Invalid: must be > 0
  "Sex": "male",
  ...
}

Response (422)

{
  "detail": [
    {
      "type": "greater_than",
      "loc": ["body", "Age"],
      "msg": "Input should be greater than 0",
      "input": -5
    }
  ]
}

Testing Inference

Built-in Test Script

The inference module includes a test script:

uv run python -m inference.inference

This will:

Initialize the predictor
Run inference on sample data
Verify singleton behavior
Display results

Reference: inference/inference.py:167-213

Expected Output

==================================================
TESTING CREDIT SCORE PREDICTOR
==================================================

[1] Input data sample:
{
    "Age": 35,
    "Sex": "male",
    "Job": "skilled",
    ...
}

[2] Inference result:
{'prediction': 'good', 'probability': 0.8234}

[3] Singleton Verification: True

SUCCESS: Predictor class and inference function are working correctly!

Custom Predictor Instances

You can create custom predictor instances with different models:

from inference.inference import Predictor

# Load a different model
custom_predictor = Predictor(
    model_path="model/model_weights_002.pth",
    config_path="config/models-configs/model_config_002.yaml",
    preprocessor_path="processing/preprocessor.joblib",
)

result = custom_predictor.inference(input_data)

Due to the singleton pattern, only the first instantiation will actually load the model. Subsequent calls return the existing instance.

Performance Considerations

Model Loading Time

Initial Load

The first prediction includes model loading overhead:

Loading YAML configuration
Loading preprocessor (joblib)
Initializing PyTorch model
Loading trained weights

Typical time: 1-3 seconds

Subsequent Predictions

After initialization, predictions are fast:

No reloading required (singleton)
Pure inference time

Typical time: 5-20 milliseconds

Batch Predictions

For multiple predictions, the API handles them sequentially. For high-throughput scenarios, consider:

# Process multiple inputs
inputs = [input1, input2, input3]
results = [predictor.inference(inp) for inp in inputs]

For production deployments, use multiple workers with Gunicorn or deploy behind a load balancer.

Troubleshooting

FileNotFoundError: Model weights not found

Error: FileNotFoundError: Model weights file not found: model/model_weights_001.pthSolution:

Ensure you’ve trained a model first
Check the file path matches your trained model
Verify the model was saved successfully during training

Shape mismatch error

Error: RuntimeError: size mismatchSolution:

Ensure the configuration file matches the trained model
Verify the preprocessor was created during the same training run
Check that input features match training data

API returns 500 Internal Server Error

Possible causes:

Model files are missing or corrupted
Preprocessor is incompatible
Invalid configuration

Solution:

Check server logs for detailed error messages
Verify all three required files exist
Retrain the model if files are corrupted

Predictions seem incorrect

Debugging steps:

Check which model weights are loaded
Verify the model’s training performance in MLflow
Validate input data format and ranges
Test with known examples from training set

Best Practices

Validate Inputs

Always use the Pydantic schemas to ensure data quality:

from server.schemas import CreditRiskInput

# This will raise validation errors for invalid data
validated_input = CreditRiskInput(**raw_data)

Handle Errors Gracefully

Wrap inference calls in try-except blocks:

try:
    result = predictor.inference(input_data)
except Exception as e:
    logger.error(f"Inference failed: {e}")
    # Handle error appropriately

Monitor Performance

Track inference latency and throughput in production:

import time

start = time.time()
result = predictor.inference(input_data)
latency = time.time() - start

logger.info(f"Inference completed in {latency:.3f}s")

Version Your Models

Keep track of which model version is deployed:

# Use environment variables or configuration
MODEL_VERSION = os.getenv("MODEL_VERSION", "001")
model_path = f"model/model_weights_{MODEL_VERSION}.pth"

Next Steps

Deployment

Deploy the inference API to production with Docker

API Reference

Explore complete API documentation

Training Models

Train new models with different configurations

MLflow Tracking

Monitor model performance and experiments

Get Started

Core Concepts

Guides

Use Cases

Overview

Predictor Class

Architecture

Initialization

Default Predictor Instance

Making Predictions Programmatically

Using the Predictor Class Directly

Understanding the Response

Inference Process Flow

REST API Inference

Starting the API Server

API Endpoint

Request Format

Response Format

Input Schema Validation

Field Specifications

Example Validation Error

Testing Inference

Built-in Test Script

Expected Output

Custom Predictor Instances

Performance Considerations

Model Loading Time

Batch Predictions

Troubleshooting

Best Practices

Next Steps

Deployment

API Reference

Training Models

MLflow Tracking

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Use Cases

​Overview

​Predictor Class

​Architecture

​Initialization

​Default Predictor Instance

​Making Predictions Programmatically

​Using the Predictor Class Directly

​Understanding the Response

​Inference Process Flow

​REST API Inference

​Starting the API Server

​API Endpoint

​Request Format

​Response Format

​Input Schema Validation

​Field Specifications

​Example Validation Error

​Testing Inference

​Built-in Test Script

​Expected Output

​Custom Predictor Instances

​Performance Considerations

​Model Loading Time

​Batch Predictions

​Troubleshooting

​Best Practices

​Next Steps

Deployment

API Reference

Training Models

MLflow Tracking

Build docs developers (and LLMs) love

Overview

Predictor Class

Architecture

Initialization

Default Predictor Instance

Making Predictions Programmatically

Using the Predictor Class Directly

Understanding the Response

Inference Process Flow

REST API Inference

Starting the API Server

API Endpoint

Request Format

Response Format

Input Schema Validation

Field Specifications

Example Validation Error

Testing Inference

Built-in Test Script

Expected Output

Custom Predictor Instances

Performance Considerations

Model Loading Time

Batch Predictions

Troubleshooting

Best Practices

Next Steps