Skip to main content

Overview

The visualization API provides utilities for rendering interactive charts, sample grids, and dataset previews using Plotly and Streamlit components.

Class Distribution Visualization

render_class_distribution_chart(dataset_info)

Render an interactive Plotly bar chart showing sample distribution across classes.
dataset_info
dict
required
Dataset information dictionary containing train_samples and val_samples
from utils.dataset_viz import render_class_distribution_chart
from state.cache import get_dataset_info

dataset_info = get_dataset_info()
if dataset_info:
    render_class_distribution_chart(dataset_info)
Expected dataset_info Structure:
dataset_info = {
    "train_samples": {
        "malware_family_1": 450,
        "malware_family_2": 380,
        "malware_family_3": 290
    },
    "val_samples": {
        "malware_family_1": 90,
        "malware_family_2": 75,
        "malware_family_3": 60
    }
}
Chart Features:
  • Grouped bar chart (Training vs Validation)
  • Custom colors (#98c127 for training, #8fd7d7 for validation)
  • 45-degree rotated x-axis labels
  • Dark theme with transparent background
  • Auto-scales to container width
File reference: app/utils/dataset_viz.py:12

Class Summary

render_class_summary(dataset_info)

Display top 5 and bottom 5 classes by sample count in a two-column layout.
dataset_info
dict
required
Dataset information dictionary containing train_samples
from utils.dataset_viz import render_class_summary

render_class_summary(dataset_info)
Output Layout:
┌─────────────────────┬─────────────────────┐
│  Most Common        │  Least Common       │
├─────────────────────┼─────────────────────┤
│  class_a: 1,245     │  class_x: 45        │
│  class_b: 982       │  class_y: 67        │
│  class_c: 834       │  class_z: 89        │
│  class_d: 756       │  class_w: 102       │
│  class_e: 698       │  class_v: 134       │
└─────────────────────┴─────────────────────┘
File reference: app/utils/dataset_viz.py:55

Sample Grid

render_sample_grid(dataset_info, selected_class)

Display a grid of sample images with their dimensions.
dataset_info
dict
required
Dataset information dictionary containing sample_paths
selected_class
str
required
Class name to display samples from, or “All” for mixed samples
from utils.dataset_viz import render_sample_grid
import streamlit as st

selected_class = st.selectbox(
    "Select Class",
    options=["All"] + dataset_info["classes"]
)

render_sample_grid(dataset_info, selected_class)
Expected dataset_info Structure:
dataset_info = {
    "sample_paths": {
        "malware_family_1": [
            Path("/path/to/sample1.png"),
            Path("/path/to/sample2.png"),
            # ... up to 10 samples per class
        ],
        "malware_family_2": [...]
    },
    "classes": ["malware_family_1", "malware_family_2"]
}
Grid Features:
  • 5-column layout
  • Up to 10 samples displayed
  • Image dimensions shown below each sample
  • Random sampling when “All” is selected (2 per class, max 10 total)
  • Error handling for missing/corrupt images
File reference: app/utils/dataset_viz.py:73 Example Output:
┌─────┬─────┬─────┬─────┬─────┐
│ img │ img │ img │ img │ img │
│224x │224x │224x │224x │224x │
│ 224 │ 224 │ 224 │ 224 │ 224 │
├─────┼─────┼─────┼─────┼─────┤
│ img │ img │ img │ img │ img │
│224x │224x │224x │224x │224x │
│ 224 │ 224 │ 224 │ 224 │ 224 │
└─────┴─────┴─────┴─────┴─────┘

Data Split Visualization

render_split_pie_chart(train_final, val_final, test_final)

Render a donut chart showing the distribution of data splits.
train_final
int
required
Number of training samples
val_final
int
required
Number of validation samples
test_final
int
required
Number of test samples
from utils.dataset_viz import render_split_pie_chart

train_count = 7000
val_count = 1500
test_count = 1500

render_split_pie_chart(train_count, val_count, test_count)
Chart Features:
  • Donut chart with 30% hole
  • Custom colors:
    • Training: #98c127 (green)
    • Validation: #8fd7d7 (cyan)
    • Test: #ffb255 (orange)
  • Compact layout (300px height)
  • Dark theme with transparent background
  • Percentage labels automatically calculated
File reference: app/utils/dataset_viz.py:104

Preprocessing Preview

render_preprocessing_preview(sample_path, target_size, color_mode)

Show before/after comparison of image preprocessing.
sample_path
Path | str
required
Path to sample image file
target_size
str
required
Target size in format “WIDTHxHEIGHT” (e.g., “224x224”)
color_mode
str
required
Color mode - “RGB” or “Grayscale”
from utils.dataset_viz import render_preprocessing_preview
from pathlib import Path

sample_path = Path("/path/to/sample.png")

render_preprocessing_preview(
    sample_path=sample_path,
    target_size="224x224",
    color_mode="RGB"
)
Visual Layout:
┌─────────────────────┬─────────────────────┐
│  Original Image     │  After Processing   │
├─────────────────────┼─────────────────────┤
│  [Original Image]   │  [Processed Image]  │
│  Size: 512x512      │  Size: 224x224      │
│                     │  Mode: RGB          │
└─────────────────────┴─────────────────────┘
Preprocessing Operations:
  1. Resize to target dimensions using LANCZOS resampling
  2. Convert to grayscale if color_mode is “Grayscale”
  3. Display size and mode information
File reference: app/utils/dataset_viz.py:126 Example with Grayscale:
render_preprocessing_preview(
    sample_path="dataset/class_a/sample_001.png",
    target_size="128x128",
    color_mode="Grayscale"
)

Plotly Configuration

All visualization functions use consistent Plotly theming:
# Common layout settings
layout = {
    "paper_bgcolor": "rgba(0,0,0,0)",  # Transparent background
    "plot_bgcolor": "rgba(0,0,0,0)",   # Transparent plot area
    "font": {"color": "#fafafa"},       # Light text color
    "height": 500,                      # Chart height
    "xaxis": {"tickangle": -45}         # Rotated x-axis labels
}
Color Palette:
  • Training data: #98c127 (Soft green)
  • Validation data: #8fd7d7 (Soft cyan)
  • Test data: #ffb255 (Soft orange)
  • Accent colors: #f45f74 (Soft pink), #bdd373 (Light green)

Image Processing

The visualization utilities use PIL (Pillow) for image processing:
from PIL import Image

# Resize with high-quality resampling
Image.Resampling.LANCZOS  # Used for all resize operations

# Color mode conversion
image.convert("L")  # Convert to grayscale
image.convert("RGB")  # Convert to RGB

Error Handling

All visualization functions include error handling:
try:
    img = Image.open(img_path)
    st.image(img, width="stretch")
    st.caption(f"{img.size[0]}x{img.size[1]}")
except Exception as exception:
    st.error(f"Error: {img_path.name}. {exception}")

Best Practices

Performance Optimization:
# Cache dataset info to avoid repeated scans
from state.cache import get_dataset_info, set_dataset_info

if not get_dataset_info():
    # Perform expensive scan
    dataset_info = scan_dataset(path)
    set_dataset_info(dataset_info)  # Cache result

# Use cached data
dataset_info = get_dataset_info()
render_class_distribution_chart(dataset_info)
Responsive Layouts:
import streamlit as st

# Use columns for side-by-side visualizations
col1, col2 = st.columns(2)

with col1:
    render_class_summary(dataset_info)

with col2:
    render_split_pie_chart(train, val, test)
Conditional Rendering:
if dataset_info and dataset_info.get("train_samples"):
    render_class_distribution_chart(dataset_info)
else:
    st.warning("No dataset information available")

Integration Example

Complete example showing dataset visualization workflow:
import streamlit as st
from state.cache import get_dataset_info
from utils.dataset_viz import (
    render_class_distribution_chart,
    render_class_summary,
    render_sample_grid,
    render_split_pie_chart,
    render_preprocessing_preview
)

st.header("Dataset Overview")

# Get cached dataset info
dataset_info = get_dataset_info()

if dataset_info:
    # Distribution chart
    st.subheader("Class Distribution")
    render_class_distribution_chart(dataset_info)
    
    # Summary statistics
    col1, col2 = st.columns(2)
    with col1:
        render_class_summary(dataset_info)
    with col2:
        total_train = dataset_info["total_train"]
        total_val = dataset_info["total_val"]
        total_test = dataset_info.get("total_test", 0)
        render_split_pie_chart(total_train, total_val, total_test)
    
    # Sample grid
    st.subheader("Sample Images")
    selected_class = st.selectbox(
        "View samples from:",
        options=["All"] + dataset_info["classes"]
    )
    render_sample_grid(dataset_info, selected_class)
    
    # Preprocessing preview
    st.subheader("Preprocessing Preview")
    sample_paths = dataset_info["sample_paths"]
    if sample_paths:
        first_class = list(sample_paths.keys())[0]
        sample_path = sample_paths[first_class][0]
        render_preprocessing_preview(
            sample_path,
            target_size="224x224",
            color_mode="RGB"
        )
else:
    st.info("Configure your dataset to view visualizations")

Build docs developers (and LLMs) love