Pipeline Overview
HeartMAP provides four specialized analysis pipelines, each building on the previous to provide progressively more comprehensive insights. All pipelines inherit from theBasePipeline abstract class, ensuring consistent interfaces and behavior.
BasePipeline
The abstract base class that defines the pipeline interface.Architecture
Key Components
Data Processor
Handles QC, normalization, and preprocessing
Visualizer
Generates publication-ready figures
Results Exporter
Exports results in multiple formats
Config
Centralized configuration management
BasicPipeline
Purpose: Initial data exploration, quality control, and cell type identificationWhen to Use
- First-time analysis of a new dataset
- Need basic cell type annotations
- Quality control assessment
- Quick exploratory analysis (5-10 minutes)
- Limited computational resources
What It Does
Load and Process Data
Loads raw H5AD file and applies quality control:
- Filters cells (min_genes=200)
- Filters genes (min_cells=3)
- Normalizes to 10,000 counts per cell
- Log transforms expression
- Selects 2,000 highly variable genes
Dimensionality Reduction
Computes PCA and constructs neighborhood graph:
- PCA with 50 components
- Neighbors graph (n_neighbors=15, n_pcs=40)
Cell Clustering
Uses Leiden algorithm for cell type identification:
- Resolution=0.5 (configurable)
- Community detection on neighborhood graph
Example Usage
Output Structure
AdvancedCommunicationPipeline
Purpose: Ligand-receptor interaction analysis and communication network mappingWhen to Use
- Investigate cell-cell signaling
- Identify communication hubs
- Study pathway-specific interactions
- Drug target discovery
- Already have cell type annotations
What It Does
Load Annotated Data
Requires pre-clustered data with cell type annotations:
- Looks for columns: ‘leiden’, ‘louvain’, ‘Cluster’, ‘cell_type’, ‘celltype’
- Must have completed Basic Pipeline or equivalent
Load L-R Database
Loads curated ligand-receptor pairs:
- LIANA consensus database (preferred)
- 100+ cardiac-relevant interactions
- Confidence scoring (threshold=0.7)
Calculate Communication
Computes cell-type to cell-type communication:
- Mean expression per cell type
- L-R co-expression scoring
- Communication strength = √(ligand_expr × receptor_expr)
Hub Score Calculation
Identifies communication hubs:
- Hub score = (std × mean) / (variance + 1)
- High scores indicate key signaling cells
Example Usage
Communication Scoring
Output Structure
MultiChamberPipeline
Purpose: Chamber-specific analysis across all four heart chambersWhen to Use
- Compare RA, RV, LA, LV chambers
- Identify chamber-specific markers
- Study cross-chamber relationships
- Understand chamber specialization
- Data contains chamber annotations
What It Does
Load Multi-Chamber Data
Requires data with chamber labels:
- Expects ‘chamber’ or ‘location’ column
- Valid chambers: RA, RV, LA, LV
Chamber-Specific Analysis
Analyzes each chamber independently:
- Differential expression per chamber
- Chamber-specific marker identification
- Within-chamber cell type composition
Cross-Chamber Correlation
Compares chambers:
- Correlation matrices between chambers
- Shared vs. unique cell populations
- Expression pattern similarities
Example Usage
Known Chamber Markers
- RA - Right Atrium
- RV - Right Ventricle
- LA - Left Atrium
- LV - Left Ventricle
Top Markers: NPPA, MIR100HG, MYL7, MYL4, PDE4DCharacteristics:
- 28.4% of total heart cells
- Natriuretic peptide signaling
- Atrial-specific contractile proteins
Output Structure
ComprehensivePipeline
Purpose: Complete HeartMAP analysis combining all featuresWhen to Use
- Need complete analysis in one run
- Publication-quality comprehensive results
- Automated HTML report generation
- Final production analysis
What It Does
Combines Basic + Communication + Multi-Chamber into a unified workflow:Complete Data Processing
Full preprocessing pipeline:
- All BasicPipeline steps
- PCA, neighbors, UMAP
- Leiden clustering
Integrated Analysis
Combines all analysis types:
- Cell type annotation
- Communication networks
- Chamber-specific patterns (if chamber data available)
Comprehensive Dashboard
Multi-panel visualization:
- Integrated figure panels
- Side-by-side comparisons
- Summary statistics
Example Usage
Output Structure
Pipeline Comparison
| Feature | Basic | Communication | Multi-Chamber | Comprehensive |
|---|---|---|---|---|
| QC & Preprocessing | ✅ | ❌ | ❌ | ✅ |
| Cell Clustering | ✅ | ❌ | ❌ | ✅ |
| L-R Analysis | ❌ | ✅ | ❌ | ✅ |
| Hub Detection | ❌ | ✅ | ❌ | ✅ |
| Chamber Markers | ❌ | ❌ | ✅ | ✅ |
| Cross-Chamber Correlations | ❌ | ❌ | ✅ | ✅ |
| Comprehensive Report | ❌ | ❌ | ❌ | ✅ |
| Runtime (typical) | 5-10 min | 10-15 min | 15-20 min | 20-30 min |
| Memory (typical) | Low | Medium | Medium | High |
| Input Requirements | Raw H5AD | Annotated H5AD | Multi-chamber H5AD | Raw H5AD |
Pipeline Selection Guide
I have raw data
Start with BasicPipeline for initial exploration, then progress to other pipelines as needed.
I need cell communication
Use BasicPipeline first for annotations, then AdvancedCommunicationPipeline.
I have chamber labels
Use MultiChamberPipeline after basic annotation to identify chamber-specific patterns.
I want everything
Use ComprehensivePipeline for complete analysis in a single run.
Custom Pipeline Workflows
You can chain pipelines for custom workflows:Next Steps
Configuration
Learn how to customize pipeline behavior
Quick Start
Run your first analysis