Overview
Chronos-DFIR is engineered to handle massive forensic datasets (6GB+ files, 500K+ events) with responsive UI and minimal memory footprint. The platform achieves this through:- Streaming I/O for large file ingestion
- Polars vectorized processing (zero pandas dependency)
- Virtual DOM rendering with Tabulator.js
- CSS GPU acceleration hints
- Backend aggregation to minimize frontend computation
Target hardware: Apple Silicon M4 with ARM NEON SIMD and unified memory architecture. All optimizations leverage these capabilities.
Backend Performance
Streaming I/O for Large Files
All file operations for datasets exceeding 50MB use lazy loading with Polarsscan_* methods. Files are never loaded entirely into memory until aggregated.
Streaming Upload (6GB+ Files)
Streaming Upload (6GB+ Files)
Problem: Traditional Benefits:
await file.read() loads entire file into RAM before processing.Solution: Chunk-based streaming with SHA256 hash computation:- Zero extra I/O: Hash computed during upload, not as a separate read
- Constant memory: 1MB buffer regardless of file size
- Chain of custody: Forensic-grade hash available immediately
CLAUDE.md:19-20 (Streaming I/O rule)Lazy File Parsing with Polars
Lazy File Parsing with Polars
Problem: Loading a 2GB CSV with Key Optimization: Queries are not executed until Reference:
pd.read_csv() or pl.read_csv() materializes the entire dataset.Solution: Use scan_csv() to return a LazyFrame:.collect() is called. Filters and aggregations are pushed down to the scan layer.engine/ingestor.py:74-76 (Parquet/CSV lazy scan)Streaming Aggregation
Streaming Aggregation
Use Case: Generate histogram buckets from 500K events without loading all rows.Benefits:
.collect(streaming=True)processes data in batches- Memory usage stays constant regardless of dataset size
- Leverages Polars’ query optimizer for pushdown filters
engine/analyzer.py:91-95 (Streaming stats collection)Polars Vectorized Processing
Chronos-DFIR has zero pandas dependency in the core engine. All data transformations use Polars vectorized expressions.No Pandas Policy
No Pandas Policy
Why No Pandas?Reference:
- Pandas is row-oriented → slow for large datasets
- Polars is columnar (Apache Arrow) → 5-100x faster for DFIR operations
- Pandas blocks the event loop → Polars supports lazy execution
- Before v180: 9 pandas imports in
app.py(SQLite, Plist, whitespace CSV) - After v180: Zero pandas
CLAUDE.md:217-220 (Pandas elimination v180)Vectorized Expressions (No Python Loops)
Vectorized Expressions (No Python Loops)
Anti-Pattern: Iterating over DataFrame rowsComplex Example: Process top/rare analysis in Performance: Polars processes 500K rows in ~50ms vs pandas ~2-3 seconds.Reference:
engine/forensic.pyCLAUDE.md:145-146 (v178 process filtering fix)Async Threading for CPU-Bound Work
Async Threading for CPU-Bound Work
Problem: Polars operations are CPU-bound → block FastAPI’s event loopSolution: Wrap in Benefits:
asyncio.to_thread()- FastAPI can handle other requests while Polars crunches data
- Total analysis time: ~800ms for 38K events (vs 3-5s blocking)
CLAUDE.md:21 (Async-first rule)Best Practice: Use
.lazy() for all DataFrame operations, chain filters/aggregations, then .collect(streaming=True) once at the end.Frontend Performance
Virtual DOM with Tabulator.js
Tabulator.js renders only visible rows (typically 50-100 at a time) even when the dataset has 500K+ events.Remote Pagination Strategy
Remote Pagination Strategy
Configuration:Backend Response:Performance: Only 1000 rows transferred per page → ~50KB JSON vs 50MB for full dataset.Reference:
CLAUDE.md:89 (v177 remote sort fix)Persistent Row Selection
Persistent Row Selection
Challenge: Checkbox selections must survive AJAX pagination.Solution: Export Integration: When exporting, backend receives
_persistentSelectedIds Set in grid.jsselected_ids array and filters accordingly.Reference: CLAUDE.md:286 (v180.7 persistent selection)Batch Redraw for Column Operations
Batch Redraw for Column Operations
Problem: Hiding 50 empty columns with Performance: 50 columns = 1 render instead of 50 (20x faster).Reference:
col.hide() triggers 50 full re-renders.Solution: Wrap in blockRedraw() / restoreRedraw()CLAUDE.md:86 (v177 toggleEmptyColumns fix)CSS Performance Optimizations
Chronos-DFIR leverages modern CSS features to offload rendering to the GPU compositor.content-visibility: auto
content-visibility: auto
Purpose: Browser skips rendering offscreen elements until they’re scrolled into view.Benefits:
- Initial page load: 60% faster (skips rendering 20+ offscreen cards)
- Scroll performance: GPU handles visibility toggle
CLAUDE.md:148 (v178 CSS GPU hints)will-change: transform
will-change: transform
Purpose: Hints to browser that an element will be animated → promotes to GPU layer.Correction (v179): Initially applied to
#chart-wrapper (has display: none), moved to the actual canvas element.Caution: Only use on elements that actually animate. Overuse hurts performance.Reference: CLAUDE.md:179 (v179 will-change correction)CSS Grid for Dashboard Cards
CSS Grid for Dashboard Cards
Before: Flexbox with manual spacingAfter: CSS Grid with gapPerformance: Browser optimizes grid layout calculations in compositor thread.
Asset Cache-Busting Mechanism
Chronos-DFIR uses automated cache-busting to prevent stale JavaScript/CSS after deployments.Auto-Computed Version Hash
Auto-Computed Version Hash
Implementation:Template Usage (Jinja2):Route Injection:Reference:
CLAUDE.md:133-143 (Auto-cachebust system)Module Import Cache-Busting
Module Import Cache-Busting
Limitation: Manual Process: Increment
_compute_asset_hash() only hashes entry points (main.js, CSS). ES6 module imports use hardcoded version tags.?v=XXX when any module changes.v181 Lesson: All v180.7 fixes were invisible to users because imports were cached at ?v=180. Server restart + manual bump to ?v=181 required.Reference: CLAUDE.md:330-332 (Cache bust lesson)Best Practice: After modifying any JS module, increment the version tag in
main.js imports before committing. Add a pre-commit hook to automate this.Backend Aggregation Strategy
Chronos-DFIR pushes all heavy computation to the backend. The frontend receives pre-aggregated data ready for rendering.Histogram Pre-Aggregation
Histogram Pre-Aggregation
Bad Pattern (client-side calculation):Good Pattern (backend provides stats):Frontend (v179 fix):Reference:
CLAUDE.md:180 (v179 backend stats preference)Dashboard Card Aggregation
Dashboard Card Aggregation
Endpoint: Reference:
/api/forensic_report/{filename}Backend Computes:- Top 5 Event IDs (with labels)
- Top 5 Tactics (MITRE ATT&CK)
- Unique IPs, Users, Hosts counts
- Sigma detection counts by severity (Critical, High, Medium, Low)
- Risk score (0-100) with justification
CLAUDE.md:287 (v180.7 dashboard refresh)Performance Benchmarks
Real-World Test Case: 38K EVTX Events
Hardware: MacBook Pro M4 (32GB RAM, ARM NEON)| Operation | Time (v176) | Time (v185) | Improvement |
|---|---|---|---|
| File Upload (4.2GB) | 18s | 12s | 33% faster |
| Initial Parse | 2.8s | 1.9s | 32% faster |
| Sigma Evaluation (86 rules) | 1.2s | 0.8s | 33% faster |
| Forensic Report (9 tasks) | 3.5s | 0.8s | 77% faster |
| Histogram Render | 420ms | 180ms | 57% faster |
| Export CSV (selected 500 rows) | 1.1s | 0.3s | 73% faster |
| PDF Generation | 8.5s | 4.2s | 51% faster |
- v177: Streaming upload, fixed remote sort
- v179: Async threading for forensic tasks
- v180: Pandas elimination,
app.pydecomposition - v180.7: Persistent selection, batch redraw
- v185: Chart debounce, CSV hex preservation
Profiling Tools
Chronos-DFIR includes built-in profiling for performance diagnostics.Backend Profiling
Backend Profiling
Polars Query Plans:Async Task Timing:
Frontend Profiling
Frontend Profiling
Chrome DevTools:
- Performance tab → Record timeline during file load
- Look for “Long Tasks” (> 50ms main thread blocks)
- Check “Compositor” for GPU layer promotions
Related Documentation
System Architecture
Understand the data flow from ingestion to visualization
Multi-Agent Workflow
Learn about the 3-agent development protocol