Skip to main content

Overview

Chronos-DFIR features a powerful multi-format ingestion engine that accepts forensic artifacts and reports through an intuitive drag-and-drop interface. The system handles files up to 6GB+ using streaming uploads and maintains chain of custody with SHA256 hashing.

Supported Formats

Forensic Artifacts

Native forensic evidence formats with specialized parsers

Generic Reports

Output from DFIR tools like Plaso, KAPE, and EDR platforms

Forensic Artifacts

Chronos-DFIR provides optimized parsers for native forensic artifacts:
1

Windows Event Logs (EVTX)

Processes Windows Event Logs with automatic extraction of EventID, Level, Provider, Computer, and descriptions. Uses the evtx_engine.py module for binary parsing.Key features:
  • Direct binary EVTX parsing
  • Automatic metadata extraction
  • Windows-specific normalization
2

MFT (Master File Table)

Deep analysis of NTFS filesystem metadata with real FILETIME parsing from $STANDARD_INFORMATION attributes.
# From mft_engine.py - Real FILETIME parsing (v179 fix)
def _read_si_timestamps(mft_record: bytes, offset: int) -> dict:
    """
    Parse $STANDARD_INFORMATION attribute (type 0x10) FILETIME values.
    Returns dict with Created, Modified, MFT_Modified, Accessed timestamps.
    """
    si_data = mft_record[offset:offset+48]
    if len(si_data) < 48:
        return {"Created": None, "Modified": None, 
                "MFT_Modified": None, "Accessed": None}
    
    # Parse FILETIME values using struct.unpack
    created_ft, modified_ft, mft_ft, accessed_ft = struct.unpack(
        "<QQQQ", si_data[0:32]
    )
    
    return {
        "Created": win64_to_datetime(created_ft),
        "Modified": win64_to_datetime(modified_ft),
        "MFT_Modified": win64_to_datetime(mft_ft),
        "Accessed": win64_to_datetime(accessed_ft)
    }
Forensic Integrity: As of v179, MFT timestamps are parsed from real FILETIME structures. Never use fabricated timestamps like datetime.now() in forensic analysis.
3

macOS Property Lists (Plist)

Detects and parses macOS Plist files including LaunchAgents and LaunchDaemons used in persistence mechanisms.
# From engine/ingestor.py
def _parse_single_plist(file_path: str) -> pl.DataFrame:
    import plistlib
    with open(file_path, 'rb') as fp:
        plist_data = plistlib.load(fp)
    
    # Sanitize values (bytes → hex, nested → str)
    sanitized = [{k: _sanitize_plist_val(v) 
                  for k, v in row.items()} for row in plist_data]
    
    return pl.DataFrame(sanitized, strict=False)
Extracted metadata:
  • Launch path and binaries
  • RunAtLoad / KeepAlive flags
  • Program arguments and signatures

Generic Report Formats

  • CSV / TSV - Reports from Plaso, KAPE, automactc
  • Excel (.xlsx) - Spreadsheet exports from DFIR tools
  • JSON / JSONL / NDJSON - Modern structured event logs
  • Parquet - Columnar format for massive Big Data hunting datasets

Drag & Drop Interface

The ingestion interface provides a seamless upload experience:
// Frontend automatically detects file types
const validExtensions = [
  '.csv', '.xlsx', '.tsv', '.parquet',
  '.json', '.jsonl', '.ndjson',
  '.evtx', '.mft', '.plist',
  '.db', '.sqlite', '.sqlite3',
  '.pslist', '.txt', '.log', '.zip'
];
ZIP Support: ZIP files are treated as bundles of macOS Plist files (LaunchAgents/LaunchDaemons). Each Plist is extracted and parsed individually.

Streaming Upload for Large Files

For files larger than 6GB, Chronos uses streaming upload to prevent memory saturation:
# From app.py:166-186
@app.post("/upload")
async def process_file(file: UploadFile = File(...)):
    file_path = os.path.join(UPLOAD_DIR, file.filename)
    
    # STREAMING UPLOAD: Stream directly to disk (6GB+ files)
    # Chain of Custody: compute SHA256 hash during upload (zero extra I/O)
    sha256 = hashlib.sha256()
    with open(file_path, "wb") as buffer:
        while chunk := file.file.read(8192):  # 8KB chunks
            sha256.update(chunk)
            buffer.write(chunk)
    
    file_hash = sha256.hexdigest()
    file_size = os.path.getsize(file_path)
    
    logger.info(f"Chain of Custody — {file.filename}: "
                f"SHA256={file_hash}, Size={file_size}")
Zero Overhead: SHA256 hash is computed during the streaming upload with no additional file read operations, maintaining performance even for multi-gigabyte evidence files.

Chain of Custody

Every file upload includes cryptographic verification:
{
  "filename": "Security.evtx",
  "parsed_file": "import_Security_1710234567.csv",
  "row_count": 38464,
  "chain_of_custody": {
    "sha256": "a3f5d8c9e2b1f4a7c6d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0",
    "size_bytes": 125829120,
    "upload_timestamp": "2026-03-09T14:22:47.123456Z"
  },
  "artifact_type": "EVTX",
  "file_category": "Windows/EventLog"
}
Forensic Standard: The SHA256 hash and file size are computed at ingestion time and should be recorded in your investigation documentation to maintain evidence integrity.

Automatic Normalization

All ingested files undergo normalization to create a unified timeline:
1

Column Detection

Automatic standardization of Time, EventID, Level, IP addresses, and user fields regardless of source format.
# From engine/ingestor.py:146-180
rename_mapping = {}
for col in cols:
    col_lower = col.strip().lower()
    if col_lower == '_time':
        final_col = 'Time'
    elif col_lower == '_id':
        final_col = 'Original_Id'
    elif col.isdigit():
        final_col = f'Field_{col}'
    else:
        # Capitalize first letter
        final_col = col[0].upper() + col[1:]
    rename_mapping[col] = final_col
2

Row Indexing

Stable _id column added to every row for tracking selections and filtering.
lf = lf.with_row_index(name="_id", offset=1)
3

Streaming Sink

For large files, normalized data is written using Polars’ streaming sink to avoid memory overflow:
if lf is not None:  # LazyFrame (large file)
    lf.sink_csv(dest_path)  # Streaming write
else:  # DataFrame (small file)
    df_eager.write_csv(dest_path)

Performance Characteristics

File SizeProcessing TimeMemory UsageMethod
< 100MB< 5 seconds~50MBEager DataFrame
100MB - 1GB10-30 seconds~100MBLazyFrame scan
1GB - 6GB1-3 minutes~200MBStreaming sink
6GB+3-10 minutes~200MBChunked streaming
Hardware Optimized: Chronos is optimized for Apple Silicon M4 with ARM NEON vectorization and unified memory architecture. All Polars operations leverage SIMD acceleration.

Technical Stack

  • Backend: Python 3.12+, FastAPI, uvicorn (async-first)
  • Data Engine: Polars (vectorized, zero-pandas)
  • File I/O: Streaming with scan_* / sink_* for datasets > 50MB
  • Parsers: evtx_engine.py, mft_engine.py, engine/ingestor.py

Next Steps

Timeline Analysis

Analyze ingested data with interactive timelines

Threat Detection

Apply Sigma and YARA rules for detection

Build docs developers (and LLMs) love