Skip to main content

Pipeline Overview

HeartMAP provides four specialized analysis pipelines, each building on the previous to provide progressively more comprehensive insights. All pipelines inherit from the BasePipeline abstract class, ensuring consistent interfaces and behavior.
from heartmap.pipelines import (
    BasePipeline,
    BasicPipeline,
    AdvancedCommunicationPipeline,
    MultiChamberPipeline,
    ComprehensivePipeline
)

BasePipeline

The abstract base class that defines the pipeline interface.

Architecture

class BasePipeline(ABC):
    def __init__(self, config: Config):
        self.config = config
        self.data_processor = DataProcessor(config)
        self.visualizer = Visualizer(config)
        self.exporter = ResultsExporter(config)
        self.results: Dict[str, Any] = {}

    @abstractmethod
    def run(self, data_path: str, output_dir: Optional[str] = None) -> Dict[str, Any]:
        """Run the complete pipeline"""
        pass

    def save_results(self, output_dir: str) -> None:
        """Save pipeline results"""
        pass

Key Components

Data Processor

Handles QC, normalization, and preprocessing

Visualizer

Generates publication-ready figures

Results Exporter

Exports results in multiple formats

Config

Centralized configuration management

BasicPipeline

Purpose: Initial data exploration, quality control, and cell type identification

When to Use

  • First-time analysis of a new dataset
  • Need basic cell type annotations
  • Quality control assessment
  • Quick exploratory analysis (5-10 minutes)
  • Limited computational resources

What It Does

1

Load and Process Data

Loads raw H5AD file and applies quality control:
  • Filters cells (min_genes=200)
  • Filters genes (min_cells=3)
  • Normalizes to 10,000 counts per cell
  • Log transforms expression
  • Selects 2,000 highly variable genes
2

Dimensionality Reduction

Computes PCA and constructs neighborhood graph:
  • PCA with 50 components
  • Neighbors graph (n_neighbors=15, n_pcs=40)
3

Cell Clustering

Uses Leiden algorithm for cell type identification:
  • Resolution=0.5 (configurable)
  • Community detection on neighborhood graph
4

Generate Visualizations

Creates essential plots:
  • UMAP colored by clusters
  • QC metrics (genes per cell, counts per cell, etc.)

Example Usage

from heartmap import Config
from heartmap.pipelines import BasicPipeline

# Initialize pipeline
config = Config.default()
pipeline = BasicPipeline(config)

# Run analysis
results = pipeline.run(
    data_path='data/raw/heart_data.h5ad',
    output_dir='results/basic'
)

# Access results
adata = results['adata']
cluster_labels = results['results']['cluster_labels']

print(f"Identified {len(set(cluster_labels))} cell clusters")
print(f"Total cells analyzed: {adata.n_obs}")

Output Structure

results/basic/
├── figures/
│   ├── umap_clusters.png      # Cluster visualization
│   └── qc_metrics.png          # Quality control plots
├── annotated_data.h5ad         # Processed data with clusters
└── results.json                # Cluster assignments

AdvancedCommunicationPipeline

Purpose: Ligand-receptor interaction analysis and communication network mapping

When to Use

  • Investigate cell-cell signaling
  • Identify communication hubs
  • Study pathway-specific interactions
  • Drug target discovery
  • Already have cell type annotations

What It Does

1

Load Annotated Data

Requires pre-clustered data with cell type annotations:
  • Looks for columns: ‘leiden’, ‘louvain’, ‘Cluster’, ‘cell_type’, ‘celltype’
  • Must have completed Basic Pipeline or equivalent
2

Load L-R Database

Loads curated ligand-receptor pairs:
  • LIANA consensus database (preferred)
  • 100+ cardiac-relevant interactions
  • Confidence scoring (threshold=0.7)
3

Calculate Communication

Computes cell-type to cell-type communication:
  • Mean expression per cell type
  • L-R co-expression scoring
  • Communication strength = √(ligand_expr × receptor_expr)
4

Hub Score Calculation

Identifies communication hubs:
  • Hub score = (std × mean) / (variance + 1)
  • High scores indicate key signaling cells
5

Pathway Analysis

Enrichment for cardiac pathways (optional)

Example Usage

from heartmap import Config
from heartmap.pipelines import AdvancedCommunicationPipeline

# Initialize pipeline
config = Config.default()
config.analysis.use_liana = True  # Enable LIANA integration

pipeline = AdvancedCommunicationPipeline(config)

# Run on annotated data
results = pipeline.run(
    data_path='results/basic/annotated_data.h5ad',
    output_dir='results/communication'
)

# Access communication results
comm_scores = results['results']['communication_scores']
hub_scores = results['results']['hub_scores']

print(f"Detected {len(comm_scores)} significant interactions")
print(f"Top communication hub: {hub_scores.idxmax()}")

Communication Scoring

# For each ligand-receptor pair:
for ligand, receptor in lr_pairs:
    for source_celltype in celltypes:
        ligand_expr = mean_expression[source_celltype][ligand]
        
        for target_celltype in celltypes:
            receptor_expr = mean_expression[target_celltype][receptor]
            
            if ligand_expr > 0.1 and receptor_expr > 0.1:
                score = sqrt(ligand_expr * receptor_expr)
                # Store: source -> target via ligand-receptor

Output Structure

results/communication/
├── figures/
│   ├── communication_heatmap.png   # Cell-type interactions
│   ├── hub_scores.png              # Communication hubs
│   └── pathway_scores.png          # Pathway enrichment
├── communication_scores.csv         # Detailed L-R interactions
├── hub_scores.csv                   # Per-cell hub scores
└── results.json

MultiChamberPipeline

Purpose: Chamber-specific analysis across all four heart chambers

When to Use

  • Compare RA, RV, LA, LV chambers
  • Identify chamber-specific markers
  • Study cross-chamber relationships
  • Understand chamber specialization
  • Data contains chamber annotations

What It Does

1

Load Multi-Chamber Data

Requires data with chamber labels:
  • Expects ‘chamber’ or ‘location’ column
  • Valid chambers: RA, RV, LA, LV
2

Chamber-Specific Analysis

Analyzes each chamber independently:
  • Differential expression per chamber
  • Chamber-specific marker identification
  • Within-chamber cell type composition
3

Cross-Chamber Correlation

Compares chambers:
  • Correlation matrices between chambers
  • Shared vs. unique cell populations
  • Expression pattern similarities
4

Comparative Visualization

Chamber comparison plots:
  • Chamber composition bar charts
  • Marker heatmaps per chamber
  • Correlation networks

Example Usage

from heartmap import Config
from heartmap.pipelines import MultiChamberPipeline

# Initialize pipeline
config = Config.default()
pipeline = MultiChamberPipeline(config)

# Run multi-chamber analysis
results = pipeline.run(
    data_path='data/raw/multi_chamber_data.h5ad',
    output_dir='results/multi_chamber'
)

# Access chamber-specific results
chamber_markers = results['results']['chamber_markers']
correlations = results['results']['cross_chamber_correlations']

print(f"Chamber-specific markers:")
for chamber, markers in chamber_markers.items():
    print(f"  {chamber}: {len(markers)} unique markers")

Known Chamber Markers

Top Markers: NPPA, MIR100HG, MYL7, MYL4, PDE4DCharacteristics:
  • 28.4% of total heart cells
  • Natriuretic peptide signaling
  • Atrial-specific contractile proteins

Output Structure

results/multi_chamber/
├── figures/
│   ├── chamber_composition.png     # Cell distribution
│   ├── chamber_markers.png         # Marker heatmaps
│   └── chamber_correlations.png    # Cross-chamber analysis
├── chamber_markers.csv              # Per-chamber markers
├── correlations.csv                 # Chamber similarity matrix
└── results.json

ComprehensivePipeline

Purpose: Complete HeartMAP analysis combining all features

When to Use

  • Need complete analysis in one run
  • Publication-quality comprehensive results
  • Automated HTML report generation
  • Final production analysis

What It Does

Combines Basic + Communication + Multi-Chamber into a unified workflow:
1

Complete Data Processing

Full preprocessing pipeline:
  • All BasicPipeline steps
  • PCA, neighbors, UMAP
  • Leiden clustering
2

Integrated Analysis

Combines all analysis types:
  • Cell type annotation
  • Communication networks
  • Chamber-specific patterns (if chamber data available)
3

Comprehensive Dashboard

Multi-panel visualization:
  • Integrated figure panels
  • Side-by-side comparisons
  • Summary statistics
4

Automated Report

Generates HTML report:
  • Analysis summary
  • All figures embedded
  • Key findings highlighted

Example Usage

from heartmap import Config
from heartmap.pipelines import ComprehensivePipeline

# Full configuration
config = Config.default()
config.data.max_cells_subset = 50000
config.data.max_genes_subset = 5000
config.analysis.use_liana = True
config.model.save_intermediate = True

# Run comprehensive analysis
pipeline = ComprehensivePipeline(config)
results = pipeline.run(
    data_path='data/raw/heart_data.h5ad',
    output_dir='results/comprehensive'
)

# Results include everything
print("Analysis components:")
for component in results['results'].keys():
    print(f"  - {component}")

Output Structure

results/comprehensive/
├── figures/
│   ├── comprehensive_dashboard.png  # Multi-panel overview
│   ├── umap_clusters.png
│   ├── communication_heatmap.png
│   ├── chamber_composition.png
│   └── hub_scores.png
├── data/
│   └── heartmap_complete.h5ad       # Fully annotated data
├── reports/
│   └── comprehensive_report.html    # Automated HTML report
├── annotation/
│   └── cluster_labels.csv
├── communication/
│   ├── communication_scores.csv
│   └── hub_scores.csv
├── multi_chamber/
│   ├── chamber_markers.csv
│   └── correlations.csv
└── results.json

Pipeline Comparison

FeatureBasicCommunicationMulti-ChamberComprehensive
QC & Preprocessing
Cell Clustering
L-R Analysis
Hub Detection
Chamber Markers
Cross-Chamber Correlations
Comprehensive Report
Runtime (typical)5-10 min10-15 min15-20 min20-30 min
Memory (typical)LowMediumMediumHigh
Input RequirementsRaw H5ADAnnotated H5ADMulti-chamber H5ADRaw H5AD

Pipeline Selection Guide

I have raw data

Start with BasicPipeline for initial exploration, then progress to other pipelines as needed.

I need cell communication

Use BasicPipeline first for annotations, then AdvancedCommunicationPipeline.

I have chamber labels

Use MultiChamberPipeline after basic annotation to identify chamber-specific patterns.

I want everything

Use ComprehensivePipeline for complete analysis in a single run.

Custom Pipeline Workflows

You can chain pipelines for custom workflows:
from heartmap import Config
from heartmap.pipelines import BasicPipeline, AdvancedCommunicationPipeline

config = Config.default()

# Step 1: Basic analysis
basic = BasicPipeline(config)
basic_results = basic.run('raw_data.h5ad', 'results/basic')

# Step 2: Communication analysis on annotated data
comm = AdvancedCommunicationPipeline(config)
comm_results = comm.run('results/basic/annotated_data.h5ad', 'results/comm')

print("Custom pipeline complete!")

Next Steps

Configuration

Learn how to customize pipeline behavior

Quick Start

Run your first analysis

Build docs developers (and LLMs) love