Overview
Theexport_onnx.py script converts trained scikit-learn models to ONNX format using skl2onnx, enabling deployment in non-Python environments and cross-platform inference runtimes.
Prerequisites
Usage
No command-line arguments required. Configuration is read from
config.yaml.How It Works
1. Load Trained Model
The script loads the scikit-learn pipeline from the artifact path specified inconfig.yaml:
deployment/export_onnx.py
2. Infer Input Schema
Automatic type inference from test data determines ONNX input types:deployment/export_onnx.py
- Numeric features (
int,uint,bool,float) →FloatTensorType - Categorical features (object, string) →
StringTensorType - Batch dimension is dynamic (
None) to support variable-length inputs
3. Convert with skl2onnx
The pipeline is converted to ONNX with target opset version 15:deployment/export_onnx.py
Target opset 15 is compatible with ONNX Runtime 1.10+ and provides stable operator support for common scikit-learn estimators.
4. Serialize and Save Artifacts
Three files are written toartifacts/:
deployment/export_onnx.py
Output Artifacts
artifacts/model.onnx
artifacts/model.onnx
Binary ONNX graph containing the full inference pipeline.
- Includes preprocessing transformers, imputers, encoders, and estimator
- Can be loaded by any ONNX Runtime (Python, C++, C#, Java, JavaScript)
- Typical size: 10-100 KB for tree ensembles, larger for neural networks
artifacts/onnx_features.json
artifacts/onnx_features.json
Feature schema listing expected input column names in order.Use this to validate input data before inference.
artifacts/onnx_metadata.json
artifacts/onnx_metadata.json
Model metadata capturing threshold and provenance.
threshold: Optimal decision boundary from trainingmodel_source: Path to original scikit-learn artifact
Expected Output
Conversion Process Details
Supported Pipeline Components
- Preprocessing
- Estimators
- Pipelines
SimpleImputerStandardScaler,MinMaxScaler,RobustScalerOneHotEncoder,OrdinalEncoder,LabelEncoderPolynomialFeaturesFunctionTransformer(limited support)
Operator Compatibility
The conversion targets ONNX opset 15, which guarantees:- All standard scikit-learn operators (tree models, linear models, SVMs)
- String processing for categorical features
- ZipMap for dictionary outputs (probability dicts)
Troubleshooting
Error: Train first: python -m src.train
Error: Train first: python -m src.train
Cause:
artifacts/best_model.joblib not found.Solution:RuntimeError: No suitable converter found
RuntimeError: No suitable converter found
Cause: Pipeline contains unsupported custom transformer.Solution:
- Check skl2onnx supported operators
- Replace custom transformer with supported equivalent
- Or implement custom converter
ValueError: Initial types mismatch
ValueError: Initial types mismatch
Cause: Feature dtype inference failed or preprocessing changes.Solution:
- Verify
X_testcolumns match model’s expected features - Check for NaN or infinite values in test data
- Ensure categorical features are properly encoded
ONNX model produces different outputs
ONNX model produces different outputs
Cause: Numerical precision differences or preprocessing mismatch.Solution: Run parity validation to diagnose:
Verification
After export, verify the ONNX model loads correctly:Next Steps
Quantization
Optimize the exported model with INT8 quantization
Parity Validation
Validate numerical agreement with scikit-learn