Overview
The visualization API provides utilities for rendering interactive charts, sample grids, and dataset previews using Plotly and Streamlit components.
Class Distribution Visualization
render_class_distribution_chart(dataset_info)
Render an interactive Plotly bar chart showing sample distribution across classes.
Dataset information dictionary containing train_samples and val_samples
from utils.dataset_viz import render_class_distribution_chart
from state.cache import get_dataset_info
dataset_info = get_dataset_info()
if dataset_info:
render_class_distribution_chart(dataset_info)
Expected dataset_info Structure:
dataset_info = {
"train_samples": {
"malware_family_1": 450,
"malware_family_2": 380,
"malware_family_3": 290
},
"val_samples": {
"malware_family_1": 90,
"malware_family_2": 75,
"malware_family_3": 60
}
}
Chart Features:
- Grouped bar chart (Training vs Validation)
- Custom colors (#98c127 for training, #8fd7d7 for validation)
- 45-degree rotated x-axis labels
- Dark theme with transparent background
- Auto-scales to container width
File reference: app/utils/dataset_viz.py:12
Class Summary
render_class_summary(dataset_info)
Display top 5 and bottom 5 classes by sample count in a two-column layout.
Dataset information dictionary containing train_samples
from utils.dataset_viz import render_class_summary
render_class_summary(dataset_info)
Output Layout:
┌─────────────────────┬─────────────────────┐
│ Most Common │ Least Common │
├─────────────────────┼─────────────────────┤
│ class_a: 1,245 │ class_x: 45 │
│ class_b: 982 │ class_y: 67 │
│ class_c: 834 │ class_z: 89 │
│ class_d: 756 │ class_w: 102 │
│ class_e: 698 │ class_v: 134 │
└─────────────────────┴─────────────────────┘
File reference: app/utils/dataset_viz.py:55
Sample Grid
render_sample_grid(dataset_info, selected_class)
Display a grid of sample images with their dimensions.
Dataset information dictionary containing sample_paths
Class name to display samples from, or “All” for mixed samples
from utils.dataset_viz import render_sample_grid
import streamlit as st
selected_class = st.selectbox(
"Select Class",
options=["All"] + dataset_info["classes"]
)
render_sample_grid(dataset_info, selected_class)
Expected dataset_info Structure:
dataset_info = {
"sample_paths": {
"malware_family_1": [
Path("/path/to/sample1.png"),
Path("/path/to/sample2.png"),
# ... up to 10 samples per class
],
"malware_family_2": [...]
},
"classes": ["malware_family_1", "malware_family_2"]
}
Grid Features:
- 5-column layout
- Up to 10 samples displayed
- Image dimensions shown below each sample
- Random sampling when “All” is selected (2 per class, max 10 total)
- Error handling for missing/corrupt images
File reference: app/utils/dataset_viz.py:73
Example Output:
┌─────┬─────┬─────┬─────┬─────┐
│ img │ img │ img │ img │ img │
│224x │224x │224x │224x │224x │
│ 224 │ 224 │ 224 │ 224 │ 224 │
├─────┼─────┼─────┼─────┼─────┤
│ img │ img │ img │ img │ img │
│224x │224x │224x │224x │224x │
│ 224 │ 224 │ 224 │ 224 │ 224 │
└─────┴─────┴─────┴─────┴─────┘
Data Split Visualization
render_split_pie_chart(train_final, val_final, test_final)
Render a donut chart showing the distribution of data splits.
Number of training samples
Number of validation samples
from utils.dataset_viz import render_split_pie_chart
train_count = 7000
val_count = 1500
test_count = 1500
render_split_pie_chart(train_count, val_count, test_count)
Chart Features:
- Donut chart with 30% hole
- Custom colors:
- Training: #98c127 (green)
- Validation: #8fd7d7 (cyan)
- Test: #ffb255 (orange)
- Compact layout (300px height)
- Dark theme with transparent background
- Percentage labels automatically calculated
File reference: app/utils/dataset_viz.py:104
Preprocessing Preview
render_preprocessing_preview(sample_path, target_size, color_mode)
Show before/after comparison of image preprocessing.
Path to sample image file
Target size in format “WIDTHxHEIGHT” (e.g., “224x224”)
Color mode - “RGB” or “Grayscale”
from utils.dataset_viz import render_preprocessing_preview
from pathlib import Path
sample_path = Path("/path/to/sample.png")
render_preprocessing_preview(
sample_path=sample_path,
target_size="224x224",
color_mode="RGB"
)
Visual Layout:
┌─────────────────────┬─────────────────────┐
│ Original Image │ After Processing │
├─────────────────────┼─────────────────────┤
│ [Original Image] │ [Processed Image] │
│ Size: 512x512 │ Size: 224x224 │
│ │ Mode: RGB │
└─────────────────────┴─────────────────────┘
Preprocessing Operations:
- Resize to target dimensions using LANCZOS resampling
- Convert to grayscale if color_mode is “Grayscale”
- Display size and mode information
File reference: app/utils/dataset_viz.py:126
Example with Grayscale:
render_preprocessing_preview(
sample_path="dataset/class_a/sample_001.png",
target_size="128x128",
color_mode="Grayscale"
)
Plotly Configuration
All visualization functions use consistent Plotly theming:
# Common layout settings
layout = {
"paper_bgcolor": "rgba(0,0,0,0)", # Transparent background
"plot_bgcolor": "rgba(0,0,0,0)", # Transparent plot area
"font": {"color": "#fafafa"}, # Light text color
"height": 500, # Chart height
"xaxis": {"tickangle": -45} # Rotated x-axis labels
}
Color Palette:
- Training data:
#98c127 (Soft green)
- Validation data:
#8fd7d7 (Soft cyan)
- Test data:
#ffb255 (Soft orange)
- Accent colors:
#f45f74 (Soft pink), #bdd373 (Light green)
Image Processing
The visualization utilities use PIL (Pillow) for image processing:
from PIL import Image
# Resize with high-quality resampling
Image.Resampling.LANCZOS # Used for all resize operations
# Color mode conversion
image.convert("L") # Convert to grayscale
image.convert("RGB") # Convert to RGB
Error Handling
All visualization functions include error handling:
try:
img = Image.open(img_path)
st.image(img, width="stretch")
st.caption(f"{img.size[0]}x{img.size[1]}")
except Exception as exception:
st.error(f"Error: {img_path.name}. {exception}")
Best Practices
Performance Optimization:
# Cache dataset info to avoid repeated scans
from state.cache import get_dataset_info, set_dataset_info
if not get_dataset_info():
# Perform expensive scan
dataset_info = scan_dataset(path)
set_dataset_info(dataset_info) # Cache result
# Use cached data
dataset_info = get_dataset_info()
render_class_distribution_chart(dataset_info)
Responsive Layouts:
import streamlit as st
# Use columns for side-by-side visualizations
col1, col2 = st.columns(2)
with col1:
render_class_summary(dataset_info)
with col2:
render_split_pie_chart(train, val, test)
Conditional Rendering:
if dataset_info and dataset_info.get("train_samples"):
render_class_distribution_chart(dataset_info)
else:
st.warning("No dataset information available")
Integration Example
Complete example showing dataset visualization workflow:
import streamlit as st
from state.cache import get_dataset_info
from utils.dataset_viz import (
render_class_distribution_chart,
render_class_summary,
render_sample_grid,
render_split_pie_chart,
render_preprocessing_preview
)
st.header("Dataset Overview")
# Get cached dataset info
dataset_info = get_dataset_info()
if dataset_info:
# Distribution chart
st.subheader("Class Distribution")
render_class_distribution_chart(dataset_info)
# Summary statistics
col1, col2 = st.columns(2)
with col1:
render_class_summary(dataset_info)
with col2:
total_train = dataset_info["total_train"]
total_val = dataset_info["total_val"]
total_test = dataset_info.get("total_test", 0)
render_split_pie_chart(total_train, total_val, total_test)
# Sample grid
st.subheader("Sample Images")
selected_class = st.selectbox(
"View samples from:",
options=["All"] + dataset_info["classes"]
)
render_sample_grid(dataset_info, selected_class)
# Preprocessing preview
st.subheader("Preprocessing Preview")
sample_paths = dataset_info["sample_paths"]
if sample_paths:
first_class = list(sample_paths.keys())[0]
sample_path = sample_paths[first_class][0]
render_preprocessing_preview(
sample_path,
target_size="224x224",
color_mode="RGB"
)
else:
st.info("Configure your dataset to view visualizations")