Skip to main content

Comprehensive HeartMAP Analysis

The ComprehensivePipeline combines all HeartMAP analysis components into a single integrated workflow: basic analysis, communication analysis, and multi-chamber analysis.

Overview

This pipeline performs:
  • Complete data preprocessing and QC
  • Cell type annotation and clustering
  • Cell-cell communication analysis
  • Multi-chamber pattern analysis
  • Comprehensive visualization dashboard
  • Automated report generation
1
Install HeartMAP
2
# Install with all features
pip install heartmap[all]
3
Prepare Your Data
4
Start with raw single-cell data:
5
import scanpy as sc

# Load your data
adata = sc.read_h5ad('data/raw/heart_data.h5ad')
# or
adata = sc.read_10x_h5('data/raw/filtered_feature_bc_matrix.h5')

print(f"Loaded: {adata.n_obs:,} cells × {adata.n_vars:,} genes")
6
Configure the Pipeline
7
Create a comprehensive configuration:
8
from heartmap import Config

# Load default config
config = Config.default()

# Customize for your system
config.data.min_genes = 200
config.data.min_cells = 3
config.data.max_cells_subset = 50000  # Adjust for your RAM
config.data.n_top_genes = 2000
config.analysis.resolution = 0.5
config.analysis.use_liana = True  # Enable L-R database

# Set paths
config.update_paths('./heartmap_analysis')
config.create_directories()

print("Configuration ready!")
9
Run Complete Analysis
10
from heartmap.pipelines import ComprehensivePipeline

print("=== Running Comprehensive HeartMAP Pipeline ===")

# Initialize pipeline
pipeline = ComprehensivePipeline(config)

# Run complete analysis
results = pipeline.run(
    data_path='data/raw/heart_data.h5ad',
    output_dir='results/comprehensive'
)

print("\nComprehensive HeartMAP pipeline completed!")
11
Explore Results
12
The pipeline returns integrated results:
13
import pandas as pd

# Extract components
adata = results['adata']
analysis_results = results['results']

# Overview
print(f"\nProcessed: {adata.n_obs:,} cells × {adata.n_vars:,} genes")

# Cell type annotation
if 'annotation' in analysis_results:
    n_clusters = len(adata.obs['leiden'].unique())
    print(f"Identified {n_clusters} cell type clusters")
    
    cluster_counts = adata.obs['leiden'].value_counts()
    print("\nTop 5 cell types:")
    for cluster, count in cluster_counts.head(5).items():
        pct = 100 * count / adata.n_obs
        print(f"  Cluster {cluster}: {count:,} cells ({pct:.1f}%)")

# Communication analysis
if 'communication' in analysis_results:
    comm_res = analysis_results['communication']
    if 'hub_scores' in comm_res:
        hub_scores = comm_res['hub_scores']
        print(f"\nHub scores computed for {len(hub_scores)} cells")

# Multi-chamber analysis
if 'chamber' in adata.obs.columns:
    print("\nChamber distribution:")
    for chamber, count in adata.obs['chamber'].value_counts().items():
        pct = 100 * count / adata.n_obs
        print(f"  {chamber}: {count:,} cells ({pct:.1f}%)")
14
View Comprehensive Dashboard
15
The pipeline generates an integrated visualization:
16
from pathlib import Path
import matplotlib.pyplot as plt
from matplotlib.image import imread

# Load dashboard
dashboard_path = Path('results/comprehensive/figures/comprehensive_dashboard.png')
if dashboard_path.exists():
    img = imread(dashboard_path)
    plt.figure(figsize=(20, 16))
    plt.imshow(img)
    plt.axis('off')
    plt.title('HeartMAP Comprehensive Dashboard', fontsize=20)
    plt.tight_layout()
    plt.show()
else:
    print("Dashboard will be generated after analysis completes")
17
Read Analysis Report
18
An automated markdown report is generated:
19
report_path = Path('results/comprehensive/analysis_report.md')
if report_path.exists():
    with open(report_path, 'r') as f:
        print(f.read())
20
Access Individual Components
21
All intermediate results are saved:
22
import scanpy as sc

# Load complete processed data
adata_complete = sc.read_h5ad('results/comprehensive/heartmap_complete.h5ad')

print("Available metadata:")
print(adata_complete.obs.columns.tolist())

print("\nAvailable embeddings:")
print(list(adata_complete.obsm.keys()))

print("\nAvailable analyses:")
print(list(adata_complete.uns.keys()))
23
Perform Downstream Analysis
24
Use the processed data for custom analyses:
25
import scanpy as sc
import matplotlib.pyplot as plt

# Differential expression between chambers
if 'chamber' in adata.obs.columns:
    sc.tl.rank_genes_groups(
        adata,
        groupby='chamber',
        method='wilcoxon',
        key_added='chamber_de'
    )
    
    # Plot top DE genes
    sc.pl.rank_genes_groups_heatmap(
        adata,
        key='chamber_de',
        n_genes=10,
        groupby='chamber',
        show=False
    )
    plt.savefig('results/comprehensive/chamber_de_heatmap.png', 
                dpi=300, bbox_inches='tight')
    plt.close()

# Find cell type markers
sc.tl.rank_genes_groups(
    adata,
    groupby='leiden',
    method='wilcoxon',
    key_added='cluster_markers'
)

# Export markers
for cluster in adata.obs['leiden'].unique():
    markers = sc.get.rank_genes_groups_df(
        adata,
        group=cluster,
        key='cluster_markers'
    ).head(50)
    markers.to_csv(
        f'results/comprehensive/markers_cluster_{cluster}.csv',
        index=False
    )

print("Downstream analysis complete!")

Complete Working Example

from heartmap import Config
from heartmap.pipelines import ComprehensivePipeline
import scanpy as sc
import pandas as pd
from pathlib import Path

# ========================================
# Setup
# ========================================
print("=== HeartMAP Comprehensive Analysis ===")

# Configure
config = Config.default()
config.data.min_genes = 200
config.data.min_cells = 3
config.data.max_cells_subset = 50000
config.data.n_top_genes = 2000
config.analysis.resolution = 0.5
config.analysis.n_neighbors = 10
config.analysis.n_pcs = 40
config.analysis.use_liana = True

config.update_paths('./analysis')
config.create_directories()

# ========================================
# Run Pipeline
# ========================================
pipeline = ComprehensivePipeline(config)
results = pipeline.run(
    data_path='data/raw/heart_data.h5ad',
    output_dir='results/comprehensive'
)

# ========================================
# Analyze Results
# ========================================
adata = results['adata']
analysis_results = results['results']

print(f"\n{'='*50}")
print("ANALYSIS SUMMARY")
print('='*50)

print(f"\nDataset: {adata.n_obs:,} cells × {adata.n_vars:,} genes")

# Cell type annotation
if 'leiden' in adata.obs.columns:
    n_clusters = len(adata.obs['leiden'].unique())
    print(f"\nCell Type Clusters: {n_clusters}")
    
    cluster_dist = adata.obs['leiden'].value_counts().head(5)
    print("\nTop 5 Clusters:")
    for cluster, count in cluster_dist.items():
        pct = 100 * count / adata.n_obs
        print(f"  Cluster {cluster}: {count:,} cells ({pct:.1f}%)")

# Chamber distribution
if 'chamber' in adata.obs.columns:
    print("\nChamber Distribution:")
    for chamber, count in adata.obs['chamber'].value_counts().items():
        pct = 100 * count / adata.n_obs
        print(f"  {chamber}: {count:,} cells ({pct:.1f}%)")

# Communication hubs
if 'hub_score' in adata.obs.columns:
    hub_mean = adata.obs['hub_score'].mean()
    hub_std = adata.obs['hub_score'].std()
    print(f"\nHub Scores: {hub_mean:.4f} ± {hub_std:.4f}")
    
    # Top hub clusters
    hub_by_cluster = adata.obs.groupby('leiden')['hub_score'].mean()
    top_hubs = hub_by_cluster.nlargest(3)
    print("\nTop Communication Hub Clusters:")
    for cluster, score in top_hubs.items():
        print(f"  Cluster {cluster}: {score:.4f}")

# ========================================
# Generate Custom Visualizations
# ========================================
print(f"\n{'='*50}")
print("GENERATING VISUALIZATIONS")
print('='*50)

import matplotlib.pyplot as plt

# Multi-panel UMAP
if 'chamber' in adata.obs.columns:
    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    
    # Panel 1: Clusters
    sc.pl.umap(adata, color='leiden', ax=axes[0], 
               show=False, frameon=False, title='Cell Types')
    
    # Panel 2: Chambers
    sc.pl.umap(adata, color='chamber', ax=axes[1],
               show=False, frameon=False, title='Chambers')
    
    # Panel 3: Hub scores
    if 'hub_score' in adata.obs.columns:
        sc.pl.umap(adata, color='hub_score', ax=axes[2],
                   show=False, frameon=False, title='Hub Scores',
                   cmap='viridis')
    
    plt.tight_layout()
    plt.savefig('results/comprehensive/overview_panel.png',
                dpi=300, bbox_inches='tight')
    plt.close()
    print("  Saved: overview_panel.png")

# ========================================
# Export Results
# ========================================
print(f"\n{'='*50}")
print("EXPORTING RESULTS")
print('='*50)

# Cell metadata
metadata = adata.obs[['leiden', 'n_genes', 'total_counts']].copy()
if 'chamber' in adata.obs.columns:
    metadata['chamber'] = adata.obs['chamber']
if 'hub_score' in adata.obs.columns:
    metadata['hub_score'] = adata.obs['hub_score']

metadata.to_csv('results/comprehensive/cell_metadata.csv')
print("  Saved: cell_metadata.csv")

# Summary statistics
summary = {
    'total_cells': adata.n_obs,
    'total_genes': adata.n_vars,
    'n_clusters': len(adata.obs['leiden'].unique()),
    'mean_genes_per_cell': adata.obs['n_genes'].mean(),
    'mean_counts_per_cell': adata.obs['total_counts'].mean()
}

if 'chamber' in adata.obs.columns:
    summary['n_chambers'] = len(adata.obs['chamber'].unique())

summary_df = pd.DataFrame([summary])
summary_df.to_csv('results/comprehensive/summary_stats.csv', index=False)
print("  Saved: summary_stats.csv")

print(f"\n{'='*50}")
print("ANALYSIS COMPLETE!")
print('='*50)
print("\nOutput files:")
for f in Path('results/comprehensive').rglob('*'):
    if f.is_file():
        print(f"  {f.relative_to('results/comprehensive')}")

Output Structure

results/comprehensive/
├── heartmap_complete.h5ad           # Complete processed data
├── heartmap_model.pkl               # Trained model (if saved)
├── analysis_report.md               # Automated report
├── cell_metadata.csv                # All cell annotations
├── summary_stats.csv                # Dataset statistics
├── figures/
│   ├── comprehensive_dashboard.png  # Integrated visualization
│   ├── umap_clusters.png           # Cell type UMAP
│   ├── qc_metrics.png              # Quality control plots
│   ├── communication_heatmap.png   # Communication matrix
│   ├── hub_scores.png              # Hub score UMAP
│   ├── chamber_composition.png     # Chamber distribution
│   └── chamber_correlations.png    # Cross-chamber similarity
└── tables/
    ├── marker_genes.csv            # Cell type markers
    ├── communication_scores.csv    # L-R interactions
    ├── hub_scores_by_type.csv     # Hub scores per type
    └── chamber_markers.csv         # Chamber-specific genes

Performance Optimization

config.data.max_cells_subset = 10000
config.data.max_genes_subset = 2000
config.data.n_top_genes = 1500
config.analysis.n_pcs = 30
config.data.max_cells_subset = 30000
config.data.max_genes_subset = 4000
config.data.n_top_genes = 2000
config.analysis.n_pcs = 40
config.data.max_cells_subset = 50000
config.data.max_genes_subset = 5000
config.data.n_top_genes = 3000
config.analysis.n_pcs = 50

Configuration Options

All configuration parameters from individual pipelines apply:
data.min_genes
int
default:"200"
Minimum genes per cell for QC
data.max_cells_subset
int
default:"50000"
Maximum cells to process (memory optimization)
analysis.resolution
float
default:"0.5"
Clustering resolution
analysis.use_liana
bool
default:"true"
Enable LIANA ligand-receptor database
model.save_intermediate
bool
default:"true"
Save intermediate results

Best Practices

Data Preparation

  • Start with raw count data (not normalized)
  • Ensure proper chamber annotations if available
  • Remove low-quality cells beforehand if needed

Resource Management

  • Monitor memory usage during analysis
  • Use max_cells_subset to prevent OOM errors
  • Enable save_intermediate for long analyses

Result Interpretation

  • Validate clusters with known markers
  • Check QC metrics for data quality issues
  • Compare chamber patterns with literature

Troubleshooting

Reduce dataset size:
config.data.max_cells_subset = 10000
config.data.max_genes_subset = 2000
  • Reduce n_pcs and n_neighbors
  • Disable use_liana if L-R analysis not needed
  • Use test mode: config.data.test_mode = True
  • Adjust resolution parameter
  • Increase n_top_genes for more features
  • Check QC metrics for quality issues

Next Steps

Visualization Guide

Advanced plotting options

CLI Usage

Command-line interface

API Reference

Complete API docs

Build docs developers (and LLMs) love