Overview
Chronos-DFIR is built as a forensic timeline explorer designed for digital forensics and incident response (DFIR) analysts. The platform ingests multi-format evidence (EVTX, CSV, MFT, Plist, XLSX), applies Sigma/YARA detection rules, and renders interactive timelines with risk-scored intelligence.The entire backend is async-first with streaming I/O to handle datasets exceeding 6GB without blocking the event loop.
Tech Stack
The application is built on a modern, performance-optimized stack targeting Apple Silicon M4 hardware with ARM NEON and unified memory optimizations.Backend Stack
Backend Stack
| Component | Technology | Constraint |
|---|---|---|
| Runtime | Python 3.12+ | Async-first architecture |
| Web Framework | FastAPI + uvicorn | Async endpoints, streaming responses |
| Data Engine | Polars (vectorized) + PyArrow | NEVER use Pandas. All transforms must be vectorized Polars expressions |
| Detection | Sigma YAML + YARA rules | Standard format only, loaded from rules/sigma/ and rules/yara/ |
| Exports | WeasyPrint, Playwright, xhtml2pdf | Multi-format: PDF, HTML standalone, CSV, XLSX, JSON |
Frontend Stack
Frontend Stack
| Component | Technology | Purpose |
|---|---|---|
| Grid Rendering | Tabulator.js (virtual DOM) | Handles 500K+ rows with pagination and column virtualization |
| Charts | Chart.js | Interactive histograms, distribution analysis |
| State Management | Custom event-driven (ChronosState) | Filters, selections, time ranges |
| Performance | CSS GPU hints (will-change, content-visibility) | Minimize main-thread computation |
System Architecture Diagram
Core Engine Modules
The backend follows a modular engine architecture with clear separation of concerns. As of v180,app.py was decomposed from 2,160 lines to 1,528 lines by extracting parsing and analysis logic.
Module Breakdown
engine/forensic.py (~1,426 lines)
engine/forensic.py (~1,426 lines)
Purpose: Core forensic analysis engine with sub-analyzers for timeline, context, hunting, identity, and process analysis.Key Functions:
get_primary_time_column()— Standardized time column detection usingTIME_HIERARCHYparse_time_boundary()— Robust parsing of start/end times from frontendsanitize_context_data()— Forensic integrity checks (EventID validation, no fabricated timestamps)sub_analyze_timeline()— Top events, top tactics, time range extractionsub_analyze_context()— IPs, users, hosts, paths, violationssub_analyze_hunting()— Suspicious patterns, network anomalies, logon analysissub_analyze_identity_and_procs()— Top/rare processes, rare execution pathscalculate_smart_risk_m4()— Multi-factor risk scoring (Sigma hits, IOCs, rare behaviors)
- Never mutate original timestamps, hex values, SIDs, or hashes
- Never fabricate timestamps (emit
nullif no real FILETIME data exists) - Column
No.is cosmetic — renumbered on display, never used as a foreign key
engine/forensic.py:1-1426engine/sigma_engine.py (~500 lines)
engine/sigma_engine.py (~500 lines)
Purpose: Dynamic YAML-to-Polars rule evaluator. Translates Sigma detection rules into Polars LazyFrame expressions at runtime.Capabilities (v1.2):
- Field modifiers:
contains,endswith,startswith,re(regex),all,any - Boolean conditions:
and,or,notbetween detection blocks - EventID list matching (
is_in) - Metadata extraction:
title,level,tags, MITRE ATT&CK techniques - Temporal correlation:
timeframe,event_count,group_by,gte - Custom aggregation blocks with time windows and thresholds
nearqueries,base64offset, andcidrmodifiers not yet supported- Temporal conditions (
timeframe,count) partially implemented
engine/sigma_engine.py:1-500engine/ingestor.py (~370 lines)
engine/ingestor.py (~370 lines)
Purpose: Multi-format file parser. Zero pandas dependency. Extracts data from 10+ formats.Supported Formats:
- Forensic artifacts: EVTX (Windows Event Logs), MFT (Master File Table), Plist (macOS)
- Generic reports: CSV, TSV, Excel (.xlsx), JSON/JSONL/NDJSON
- Databases: SQLite (.db, .sqlite3)
- Big data: Parquet (columnar format)
- Text logs: TXT, LOG (unified logs, whitespace-delimited)
- Archives: ZIP (Plist bundles only)
ingest_file()— Main entry point, returns(LazyFrame, DataFrame, file_category)_read_whitespace_csv()— Handles pslist, ls-triage output without pandas_sanitize_plist_val()— Converts plist bytes/datetime/nested structures to Polars-safe types
scan_parquet(), scan_csv() for lazy loading. Only materializes with .collect() after aggregation.Reference: engine/ingestor.py:1-370engine/analyzer.py (~251 lines)
engine/analyzer.py (~251 lines)
Purpose: Histogram and time-series analysis. Generates chart data with trend analysis, anomaly detection, and distribution breakdowns.Key Functions:
analyze_dataframe()— Main histogram generator- Auto-detects time column using
TIME_HIERARCHY - Parses 10+ datetime formats + epoch timestamps (seconds/milliseconds/microseconds)
- Smart bucketing: minutes, hours, days, months, years (based on data span)
- Computes mean, peak, trend analysis (alza/baja/estable)
- Auto-detects time column using
build_chronos_timeseries()— Structured chart data with metadata (referenced intimeline_skill.py)
- Lazy execution with
.lazy().select()to minimize memory - Streaming aggregation with
.collect(streaming=True) - Vectorized Polars expressions (no Python loops)
engine/analyzer.py:1-251engine/skill_router.py (~300 lines)
engine/skill_router.py (~300 lines)
Purpose: Central registry of all 76 skills with integration status tracking.Skill Categories:
Key Functions:
| Status | Count | Description |
|---|---|---|
active | 10 | Production code in engine/ or app.py |
frontend | 5 | Implemented in static/js/ |
rules | 5 | Implemented via Sigma YAML or YARA |
wired | 4 | Code exists but not connected to endpoints |
prompt_only | 52 | System prompts for AI agents (not yet implemented) |
get_skill_summary()— Returns categorized skill dictionaryget_high_priority_prompts()— Identifies top 5 skills for next activationprint_registry_report()— CLI summary (runpython engine/skill_router.py)
engine/skill_router.py:1-300Run
python engine/skill_router.py from the project root to see the current skill activation status and high-priority candidates.Data Flow: Ingestion to Visualization
The typical lifecycle of evidence processing follows this sequence:1. File Upload (Streaming)
Chain of Custody: SHA256 hash is computed during upload with zero additional disk I/O. Hash + file size returned in
chain_of_custody field.2. File Parsing (Ingestor)
- File extension → Parser routing (
.evtx→ EVTX engine,.csv→ Polars scan_csv) - Special cases: whitespace-delimited TXT, SQLite table detection, Plist sanitization
3. Forensic Analysis (Parallel Tasks)
The/api/forensic_report endpoint runs 9 parallel async tasks using asyncio.gather():
4. Sigma Rule Evaluation
FORENSIC_CONTEXT_COLUMNS (27 key columns like User, Process, IP, CommandLine) are automatically added to evidence samples if present in the dataset.
Reference: engine/sigma_engine.py:200-350
5. Frontend Rendering
Tabulator Grid (Virtual DOM):- Remote pagination: loads 1000 rows per page via AJAX (
/api/data/{filename}) - Persistent row selection:
_persistentSelectedIdsSet survives pagination - Column filters:
headerFilterChangedevent emitsFILTERS_CHANGEDfor chart sync
- Receives pre-aggregated data from backend (
/api/histogram/{filename}) - Auto-scales to linear or log10 based on peak/mean ratio
- Syncs with filters: global search, time range, column filters, row selection
Performance Optimizations
Chronos-DFIR is designed to handle massive datasets (500K+ events, 6GB+ files) with responsive UI.Backend Performance
Backend Performance
- Streaming I/O:
scan_csv(),scan_parquet()→ lazy loading, no memory spike - Vectorized Polars: Zero Python loops over dataframes. All operations use
.filter(),.group_by(),.agg() - Lazy Execution: Only materialize with
.collect()after filtering and aggregation - Async Threading: CPU-bound Polars work wrapped in
asyncio.to_thread()to avoid blocking - Cache-busting: Auto-computed MD5 hash of JS/CSS assets prevents stale cache
CLAUDE.md:19-22 (Hard Rules)Frontend Performance
Frontend Performance
- Virtual DOM: Tabulator.js renders only visible rows (50-100 at a time)
- CSS GPU Acceleration:
content-visibility: autoon.tabulator(lazy render offscreen rows)will-change: transformon#chart-wrapper canvas(GPU compositing)
- Debouncing: 1200ms debounce on filter changes to prevent request floods
- Batch Redraw:
table.blockRedraw()/table.restoreRedraw()for column operations - Backend Aggregation: Chart peak/mean calculated server-side (not in JS)
CLAUDE.md:148-149, CLAUDE.md:179-180Evidence Integrity Guarantees
Chronos-DFIR follows Zimmerman Logic for forensic artifact handling: Export Format Rules:- CSV/XLSX: Flat tabular (one row per event). Hex values preserved with BOM UTF-8 +
xlsxwritertext formatting - JSON: Nested structure compatible with SOAR ingestion (Splunk SOAR, Cortex XSOAR)
- Context Export: Uses
generate_export_payloads()for AI-optimized summaries
CLAUDE.md:26-30 (Evidence Integrity)
Development Phases Roadmap
| Phase | Status | Description |
|---|---|---|
| Etapa 0 | ✅ COMPLETED | Export/filter stabilization (5 bugs) |
| Etapa 1 | ✅ COMPLETED | TTP context enrichment (Sigma evidence, YARA, correlation) |
| Etapa 1.5 | ✅ COMPLETED | Real-world testing (8 bugs: hex, selection, dashboard) |
| Etapa 2 | 🟡 PENDING | DuckDB case management (engine/case_db.py exists) |
| Etapa 3 | 🟡 PENDING | Sidebar + journal UI |
| Etapa 4 | 🟡 PENDING | Multi-file correlation (cross-file timeline) |
| Etapa 5 | 🟡 PENDING | MCP server + AI chat integration |
| Etapa 6 | 🟡 PENDING | Auto-narrative generation |
README.md:306-318 (Roadmap)
CI/CD & Quality Gates
Pre-commit Hook (.git/hooks/pre-commit):
.github/workflows/ci.yml):
- Full test suite (
pytest) - Code constraint validation
- Sigma YAML schema validation (86 rules)
- Skill registry integrity check
CLAUDE.md:72-75 (CI/CD)
Related Documentation
Performance Tuning
Deep dive into streaming I/O, Polars vectorization, and CSS optimizations
Multi-Agent Workflow
Learn about the 3-agent development protocol and skill registry