Overview
Theengine/forensic.py module is the heart of Chronos-DFIR’s forensic analysis capabilities. It provides:
- Timeline Analysis — Adaptive time-bucketing for event clustering
- Context Sanitization — EventID validation and forensic data cleaning
- Hunting & Pattern Detection — TTP-based suspicious activity detection
- MITRE ATT&CK Enrichment — Automatic tactic/technique mapping
- WAF Threat Profiling — Attacker IP behavioral analysis
- Identity & Process Analysis — User, host, and process correlation
All functions use Polars for vectorized operations. Zero pandas dependency.
Configuration
Time Hierarchy
The engine uses a priority-based approach to identify time columns:EventID Hierarchy
For Windows event log analysis:Core Functions
get_primary_time_column()
Standardized logic to select the best timestamp column from a dataset.
Signature:
List of column names from the DataFrame
Name of the best time column, or
None if no suitable column found- Exact case-insensitive match against
TIME_HIERARCHY - Fallback to columns containing keywords:
time,timestamp,date,datetime,seen,created - Last resort: columns with
timezonesubstring
sanitize_context_data()
Applies forensic-grade sanitization to telemetry data. Ensures EventIDs are valid integers (1-65535) and removes artifacts.
Signature:
Input forensic telemetry data
LazyFrame with additional
Validated_EventID column (Int64)- Locate EventID column using
EVENT_ID_HIERARCHY - Cast to string, strip trailing
.0 - Cast to Int64 (non-strict)
- Range validation:
0 < EventID < 65535 - Invalid values set to
None
normalize_time_columns_in_df()
Converts all timestamp-like columns to standardized YYYY-MM-DD HH:MM:SS string format.
Signature:
Input DataFrame with mixed timestamp formats
LazyFrame with all time columns normalized to
%Y-%m-%d %H:%M:%S- Native Polars:
pl.Datetime,pl.Date - Epoch seconds/milliseconds: Int64, Float64
- ISO 8601:
2024-01-01T12:00:00.123Z - Custom formats:
YYYY/MM/DD HH:MM:SS,DD/MM/YYYY HH:MM:SS
Sub-Analysis Functions
sub_analyze_timeline()
Generates timeline statistics with adaptive bucketing for histogram visualization.
Signature:
Input forensic dataset (collected, not lazy)
Dictionary with keys:
type:"timeline"peaks: List of{"hour": str, "count": int}(top 3 time buckets)time_range:"YYYY-MM-DD HH:MM:SS to YYYY-MM-DD HH:MM:SS"
| Duration | Bucket Size |
|---|---|
| < 3 hours | 5 minutes |
| 3-6 hours | 15 minutes |
| 6-12 hours | 30 minutes |
| 12-48 hours | 1 hour |
| 2-7 days | 6 hours |
| > 7 days | 1 day |
sub_analyze_context()
Extracts forensic context: top EventIDs, tactics, IPs, users, hosts, processes, and commands.
Signature:
Input forensic dataset
Dictionary with keys:
type:"context"event_ids: Top 10 EventIDs with labels (fromSYSMON_EVENT_LABELS)tactics: Top 10 forensic categoriesthreat_actors: Top 10 attacker IPs (WAF only)ips,users,hosts,processes,commands,paths,violations: Top 8 of eachmetadata: System info and action recommendations
sub_analyze_hunting()
Detects suspicious patterns using TTP-based regex rules (LOLBins, persistence, credential access, etc.).
Signature:
Input forensic dataset
Dictionary with keys:
type:"hunting"patterns: Suspicious commands with timestamps, users, and tactic labelsnetwork: Top network destinationslogons: Logon event summary
sub_analyze_identity_and_procs()
Summarizes identity activity and process execution patterns.
Signature:
Input forensic dataset
Dictionary with keys:
type:"identity"users: Top 5 users by event counthosts: Top 5 hosts by event countprocesses: Top 8 most common processesrare_processes: Top 5 least common processes (anomaly detection)rare_paths: Top 5 least common file paths
MITRE ATT&CK Enrichment
enrich_with_mitre_attck()
Adds MITRE ATT&CK tactic/technique columns using fully vectorized Polars expressions.
Signature:
Input forensic dataset
Data source type:
"auto", "waf", or "evtx"Original DataFrame with 3 new columns:
MITRE_Tactic: e.g., “Initial Access”, “Credential Access”MITRE_ID: e.g., “T1190”, “T1110”MITRE_Technique: Combined label, e.g., “T1190 — Initial Access”
WAF Threat Profiling
generate_waf_threat_profiles()
Builds behavioral profiles for attacking IPs from WAF logs.
Signature:
WAF log dataset with columns like
ClientIP, RequestPath, ViolationCategoryTop 10 attacker profiles, each containing:
ip: Attacker IP addresstotal: Total requestsfirst_seen: ISO timestamplast_seen: ISO timestampdwell: Time span (e.g., “3m 45s”)top_uris: Dict of{URI: count}top_rules: Dict of{rule_name: count}payload_samples: List of decoded attack payloads (up to 3)mitre_id: Assigned MITRE technique ID
EventID Labels
Sysmon Events
The engine includes a comprehensive mapping of Sysmon EventIDs to human-readable labels:Windows Security Events
Full mapping available inSYSMON_EVENT_LABELS dict (lines 220-278 in forensic.py).
Best Practices
Forensic Integrity Rules:
- Never mutate original evidence — all transformations are non-destructive
- Use
sanitize_context_data()early in analysis pipelines - Always normalize timestamps with
normalize_time_columns_in_df()before correlation - Prefer
LazyFramefor large datasets (streaming evaluation)
Performance Tips
Example Pipeline
API Reference Summary
| Function | Input | Output | Purpose |
|---|---|---|---|
get_primary_time_column() | List[str] | Optional[str] | Find best timestamp column |
sanitize_context_data() | pl.LazyFrame | pl.LazyFrame | Validate EventIDs (1-65535) |
normalize_time_columns_in_df() | pl.LazyFrame | pl.LazyFrame | Standardize timestamps |
sub_analyze_timeline() | pl.DataFrame | dict | Adaptive time bucketing + peaks |
sub_analyze_context() | pl.DataFrame | dict | Top N entities (IPs, users, hosts, etc.) |
sub_analyze_hunting() | pl.DataFrame | dict | TTP-based pattern detection |
sub_analyze_identity_and_procs() | pl.DataFrame | dict | User/host/process summary + anomalies |
enrich_with_mitre_attck() | pl.DataFrame | pl.DataFrame | Add MITRE_Tactic, MITRE_ID columns |
generate_waf_threat_profiles() | pl.DataFrame | list[dict] | WAF attacker behavioral profiles |
Related Documentation
- Sigma Engine — Detection rule evaluation
- Ingestor — Multi-format file parsing
- Architecture — System design overview