AnalysisConfig class defines configuration parameters for dimensionality reduction, clustering, and downstream analysis in HeartMAP.
Class Definition
Constructor
Configuration Fields
n_components_pca
Number of principal components to compute during PCA. More components capture more variance but increase computation time. Used as input for neighborhood graph construction.
n_neighbors
Number of neighbors to use when constructing the k-nearest neighbor graph. This parameter affects the connectivity of the neighborhood graph and influences clustering results. Lower values emphasize local structure; higher values emphasize global structure.
n_pcs
Number of principal components to use for computing the neighborhood graph and UMAP. Should be less than or equal to
n_components_pca. Using fewer PCs can reduce noise.resolution
Resolution parameter for Leiden/Louvain clustering. Controls the coarseness of the clustering. Higher values produce more clusters (finer granularity); lower values produce fewer clusters (coarser granularity). Typical range: 0.1 to 2.0.
n_marker_genes
Number of marker genes to identify per cluster. These genes are most differentially expressed in each cluster and are useful for cell type annotation.
use_leiden
Whether to use Leiden clustering algorithm. If
true, uses Leiden algorithm (recommended). If false, falls back to Louvain algorithm. Leiden generally provides better quality clusters.use_liana
Whether to perform cell-cell communication analysis using LIANA (Ligand-Receptor Analysis). If
true, runs LIANA to identify ligand-receptor interactions between cell types.Usage Examples
Default Configuration
Custom Configuration
Fine-Grained Clustering
Coarse Clustering
Fast Analysis (Skip Communication)
High-Dimensional Analysis
Using with Main Config
Loading from YAML
Best Practices
Dimensionality Reduction
- n_components_pca: Use 30-100 depending on dataset complexity
- Small datasets (< 10k cells): 30-50 components
- Large datasets (> 50k cells): 50-100 components
- n_pcs: Use 70-90% of
n_components_pca- Set to 30-50 for most datasets
- Check explained variance ratio to determine optimal number
Neighborhood Graph
- n_neighbors: Typical range 10-30
- 10-15: Emphasizes local structure, more clusters
- 20-30: Emphasizes global structure, fewer clusters
- Larger values for datasets > 50k cells
Clustering Resolution
- resolution: Adjust based on expected cell type diversity
- 0.2-0.4: Major cell types (e.g., immune, epithelial, stromal)
- 0.5-0.8: Cell subtypes (e.g., T cell subtypes, macrophage states)
- 0.9-2.0: Fine-grained states (e.g., activation states, cell cycle)
- Start with 0.5 and adjust based on biological knowledge
- Use multiple resolutions to explore hierarchical structure
Marker Genes
- n_marker_genes: 20-50 is typical
- 20-25: Quick overview of cluster identity
- 50-100: Detailed characterization for annotation
Algorithm Selection
- use_leiden: Always use
true(Leiden is superior to Louvain) - use_liana: Set to
falseif:- Only interested in cell type identification
- Limited computational resources
- Dataset has < 5 cell types (communication less interesting)
Common Configurations
Quick Exploratory Analysis
Comprehensive Analysis
Large Dataset Analysis
See Also
- Config - Main configuration class
- DataConfig - Data processing configuration
- ModelConfig - Model configuration