ModelConfig class defines configuration parameters for model execution, computational resources, and output management in HeartMAP.
Class Definition
Constructor
Configuration Fields
model_type
Type of analysis workflow to run. Options:
"comprehensive": Full analysis pipeline including all preprocessing, clustering, marker identification, and cell-cell communication"basic": Basic analysis without advanced features like LIANA communication analysis"custom": User-defined workflow (requires additional configuration)
save_intermediate
Whether to save intermediate results during the analysis pipeline. When
true, saves processed data, PCA results, and intermediate AnnData objects. Useful for debugging and resuming interrupted analyses. When false, only saves final results to reduce disk usage.use_gpu
Whether to use GPU acceleration for computations. When
true, uses GPU for compatible operations (requires CUDA-enabled GPU and appropriate drivers). Significantly speeds up large dataset analysis. When false, uses CPU only.batch_size
Batch size for processing operations. If set, processes data in batches of this size. Useful for controlling memory usage with large datasets. If
null, processes all data at once (optimal for small/medium datasets).max_memory_gb
Maximum memory usage in gigabytes. If set, the pipeline will attempt to limit memory consumption to this value by adjusting batch sizes and using memory-efficient operations. If
null, no memory limit is enforced.Usage Examples
Default Configuration
Custom Configuration
GPU-Accelerated Analysis
Memory-Constrained Analysis
Large Dataset Analysis
Production Analysis
Quick Testing
Using with Main Config
Loading from YAML
Best Practices
Model Type Selection
- comprehensive: Use for full publication-quality analysis
- Includes all preprocessing, clustering, markers, and communication
- Recommended for most use cases
- basic: Use for quick exploratory analysis
- Skips time-consuming steps like LIANA
- Good for initial data quality checks
- custom: Reserved for advanced users with specific workflows
Intermediate Results
- save_intermediate=true: Recommended when:
- Developing or debugging pipelines
- Running long analyses that might be interrupted
- Need to inspect intermediate steps
- Disk space is not a concern
- save_intermediate=false: Use when:
- Running production analyses with known good parameters
- Disk space is limited
- Only final results are needed
GPU Usage
- use_gpu=true: Recommended when:
- CUDA-enabled GPU is available
- Dataset has > 50k cells
- Running multiple analyses
- Speed is critical
- use_gpu=false: Use when:
- No GPU available
- Dataset is small (< 10k cells)
- Reproducibility is critical (GPU results may have minor numerical differences)
Batch Processing
- batch_size=None: Use for small/medium datasets (< 50k cells) with sufficient memory
- batch_size=3000-5000: Use for large datasets or memory-constrained systems
- batch_size=10000+: Use for very large datasets with ample memory
Memory Management
- max_memory_gb: Set to 70-80% of available RAM
- 16 GB RAM → max_memory_gb=12.0
- 32 GB RAM → max_memory_gb=24.0
- 64 GB RAM → max_memory_gb=48.0
- Combine with batch_size for fine-grained control
- Set save_intermediate=false to reduce memory pressure
Common Configurations
Laptop/Desktop (16 GB RAM, No GPU)
Workstation (64 GB RAM, GPU)
HPC Cluster (128 GB RAM, GPU)
Cloud Instance (32 GB RAM, GPU)
Performance Tips
- Enable GPU when available for 2-5x speedup on large datasets
- Disable intermediate saves in production to reduce I/O overhead
- Use appropriate batch size to balance memory usage and performance
- Set memory limit to prevent out-of-memory errors
- Use basic model_type for initial data exploration, then switch to comprehensive
See Also
- Config - Main configuration class
- DataConfig - Data processing configuration
- AnalysisConfig - Analysis configuration