Comprehensive HeartMAP Analysis
TheComprehensivePipeline combines all HeartMAP analysis components into a single integrated workflow: basic analysis, communication analysis, and multi-chamber analysis.
Overview
This pipeline performs:- Complete data preprocessing and QC
- Cell type annotation and clustering
- Cell-cell communication analysis
- Multi-chamber pattern analysis
- Comprehensive visualization dashboard
- Automated report generation
import scanpy as sc
# Load your data
adata = sc.read_h5ad('data/raw/heart_data.h5ad')
# or
adata = sc.read_10x_h5('data/raw/filtered_feature_bc_matrix.h5')
print(f"Loaded: {adata.n_obs:,} cells × {adata.n_vars:,} genes")
from heartmap import Config
# Load default config
config = Config.default()
# Customize for your system
config.data.min_genes = 200
config.data.min_cells = 3
config.data.max_cells_subset = 50000 # Adjust for your RAM
config.data.n_top_genes = 2000
config.analysis.resolution = 0.5
config.analysis.use_liana = True # Enable L-R database
# Set paths
config.update_paths('./heartmap_analysis')
config.create_directories()
print("Configuration ready!")
from heartmap.pipelines import ComprehensivePipeline
print("=== Running Comprehensive HeartMAP Pipeline ===")
# Initialize pipeline
pipeline = ComprehensivePipeline(config)
# Run complete analysis
results = pipeline.run(
data_path='data/raw/heart_data.h5ad',
output_dir='results/comprehensive'
)
print("\nComprehensive HeartMAP pipeline completed!")
import pandas as pd
# Extract components
adata = results['adata']
analysis_results = results['results']
# Overview
print(f"\nProcessed: {adata.n_obs:,} cells × {adata.n_vars:,} genes")
# Cell type annotation
if 'annotation' in analysis_results:
n_clusters = len(adata.obs['leiden'].unique())
print(f"Identified {n_clusters} cell type clusters")
cluster_counts = adata.obs['leiden'].value_counts()
print("\nTop 5 cell types:")
for cluster, count in cluster_counts.head(5).items():
pct = 100 * count / adata.n_obs
print(f" Cluster {cluster}: {count:,} cells ({pct:.1f}%)")
# Communication analysis
if 'communication' in analysis_results:
comm_res = analysis_results['communication']
if 'hub_scores' in comm_res:
hub_scores = comm_res['hub_scores']
print(f"\nHub scores computed for {len(hub_scores)} cells")
# Multi-chamber analysis
if 'chamber' in adata.obs.columns:
print("\nChamber distribution:")
for chamber, count in adata.obs['chamber'].value_counts().items():
pct = 100 * count / adata.n_obs
print(f" {chamber}: {count:,} cells ({pct:.1f}%)")
from pathlib import Path
import matplotlib.pyplot as plt
from matplotlib.image import imread
# Load dashboard
dashboard_path = Path('results/comprehensive/figures/comprehensive_dashboard.png')
if dashboard_path.exists():
img = imread(dashboard_path)
plt.figure(figsize=(20, 16))
plt.imshow(img)
plt.axis('off')
plt.title('HeartMAP Comprehensive Dashboard', fontsize=20)
plt.tight_layout()
plt.show()
else:
print("Dashboard will be generated after analysis completes")
report_path = Path('results/comprehensive/analysis_report.md')
if report_path.exists():
with open(report_path, 'r') as f:
print(f.read())
import scanpy as sc
# Load complete processed data
adata_complete = sc.read_h5ad('results/comprehensive/heartmap_complete.h5ad')
print("Available metadata:")
print(adata_complete.obs.columns.tolist())
print("\nAvailable embeddings:")
print(list(adata_complete.obsm.keys()))
print("\nAvailable analyses:")
print(list(adata_complete.uns.keys()))
import scanpy as sc
import matplotlib.pyplot as plt
# Differential expression between chambers
if 'chamber' in adata.obs.columns:
sc.tl.rank_genes_groups(
adata,
groupby='chamber',
method='wilcoxon',
key_added='chamber_de'
)
# Plot top DE genes
sc.pl.rank_genes_groups_heatmap(
adata,
key='chamber_de',
n_genes=10,
groupby='chamber',
show=False
)
plt.savefig('results/comprehensive/chamber_de_heatmap.png',
dpi=300, bbox_inches='tight')
plt.close()
# Find cell type markers
sc.tl.rank_genes_groups(
adata,
groupby='leiden',
method='wilcoxon',
key_added='cluster_markers'
)
# Export markers
for cluster in adata.obs['leiden'].unique():
markers = sc.get.rank_genes_groups_df(
adata,
group=cluster,
key='cluster_markers'
).head(50)
markers.to_csv(
f'results/comprehensive/markers_cluster_{cluster}.csv',
index=False
)
print("Downstream analysis complete!")
Complete Working Example
Output Structure
Performance Optimization
For 8GB RAM Systems
For 8GB RAM Systems
For 16GB RAM Systems
For 16GB RAM Systems
For 32GB+ RAM Systems
For 32GB+ RAM Systems
Configuration Options
All configuration parameters from individual pipelines apply:Minimum genes per cell for QC
Maximum cells to process (memory optimization)
Clustering resolution
Enable LIANA ligand-receptor database
Save intermediate results
Best Practices
Data Preparation
- Start with raw count data (not normalized)
- Ensure proper chamber annotations if available
- Remove low-quality cells beforehand if needed
Resource Management
- Monitor memory usage during analysis
- Use
max_cells_subsetto prevent OOM errors - Enable
save_intermediatefor long analyses
Result Interpretation
- Validate clusters with known markers
- Check QC metrics for data quality issues
- Compare chamber patterns with literature
Troubleshooting
Memory errors during analysis
Memory errors during analysis
Reduce dataset size:
Analysis takes too long
Analysis takes too long
- Reduce
n_pcsandn_neighbors - Disable
use_lianaif L-R analysis not needed - Use test mode:
config.data.test_mode = True
Poor clustering results
Poor clustering results
- Adjust
resolutionparameter - Increase
n_top_genesfor more features - Check QC metrics for quality issues
Next Steps
Visualization Guide
Advanced plotting options
CLI Usage
Command-line interface
API Reference
Complete API docs