Skip to main content

POST /upload

Upload and process forensic artifacts with streaming I/O for files up to 6GB+. The endpoint automatically routes to specialized parsers based on file extension and artifact type.

Request Format

Content-Type: multipart/form-data
file
file
required
Forensic artifact file. Supports: EVTX, MFT, CSV, XLSX, JSON, Parquet, SQLite, Plist, ZIP, PSList, TXT, LOG
artifact_type
string
required
Type of forensic artifact. Used for specialized parsing.Supported values:
  • evtx: Windows Event Logs
  • mft: NTFS Master File Table
  • csv: Generic CSV/TSV
  • xlsx: Excel spreadsheet
  • json: JSON/JSONL/NDJSON
  • sqlite: SQLite database
  • plist: macOS Property List
  • generic: Auto-detect format
case_id
string
Case identifier for multi-file investigations. Registers file in case database.
phase_id
string
Investigation phase ID (e.g., “initial_triage”, “deep_dive”). Groups files by analysis stage.

Response Format

status
string
required
Always "success" on successful upload
message
string
Human-readable status message
data_url
string
API endpoint to query processed data: /api/data/{csv_filename}
csv_filename
string
Internal CSV filename for normalized timeline data
xlsx_filename
string
Excel export filename (only for forensic artifacts like EVTX/MFT)
processed_records
integer
Number of timeline events extracted
file_category
string
Classification: "forensic" (EVTX/MFT) or "generic" (CSV/JSON)
original_filename
string
Original uploaded filename
file_id
string
Database ID if case_id was provided
chain_of_custody
object
Cryptographic verification metadata
sha256
string
SHA256 hash computed during streaming upload (zero extra I/O)
file_size_bytes
integer
Exact file size in bytes
original_filename
string
Original filename for audit trail

Streaming Upload Architecture

Chronos-DFIR uses zero-copy streaming to handle large files without memory exhaustion:
  1. Chunked upload: File read in 8KB chunks
  2. Simultaneous hashing: SHA256 computed during upload (no extra read)
  3. Disk write: Direct write to chronos_uploads/ directory
  4. Lazy parsing: Files are scanned (not loaded) using Polars scan_csv()
A 6GB EVTX file is processed with ~200MB peak RAM usage.

Examples

curl -X POST http://localhost:8000/upload \
  -F "[email protected]" \
  -F "artifact_type=evtx" \
  -F "case_id=CASE-2024-001" \
  -F "phase_id=initial_triage"

Response Examples

{
  "status": "success",
  "message": "File processed successfully",
  "data_url": "/api/data/Security_evtx_1704067200.csv",
  "processed_records": 42084,
  "csv_filename": "Security_evtx_1704067200.csv",
  "xlsx_filename": "Security_evtx_1704067200.xlsx",
  "original_filename": "Security.evtx",
  "file_category": "forensic",
  "file_id": "file_abc123",
  "chain_of_custody": {
    "sha256": "a3f5b2c1d4e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a2",
    "file_size_bytes": 6442450944,
    "original_filename": "Security.evtx"
  }
}

Artifact Type Routing

Generic Artifacts

Files with these extensions are processed as generic tabular data:
  • .csv, .tsv → Polars scan_csv
  • .xlsx → Polars read_excel
  • .json, .jsonl, .ndjson → Polars read_json
  • .parquet → Polars scan_parquet
  • .db, .sqlite, .sqlite3 → SQLite cursor + Polars DataFrame
  • .plist → plistlib + Polars DataFrame
  • .pslist, .txt, .log → Whitespace regex parser
  • .zip → Automatic extraction
Processing:
  1. Ingest using format-specific parser
  2. Normalize column names (remove special characters)
  3. Detect time columns (hierarchy: Time > timestamp > datetime)
  4. Save to chronos_output/import_{filename}_{timestamp}.csv

Forensic Artifacts

Files with specialized forensic formats:
  • .evtx → Windows Event Logs (uses evtx_dump + timeline_skill)
  • .mft → NTFS Master File Table (binary parser with $STANDARD_INFORMATION)
Processing:
  1. Call generate_unified_timeline() from timeline_skill.py
  2. Parse binary structures (EVTX XML, MFT records)
  3. Extract timestamps (Creation, Modification, Access, Entry Modified)
  4. Generate CSV + XLSX outputs
  5. Return forensic-grade metadata
MFT parsing uses real FILETIME values from $STANDARD_INFORMATION attribute. Never fabricates timestamps.

Chain of Custody

The upload endpoint computes SHA256 hash during streaming upload with zero extra I/O:
sha256 = hashlib.sha256()
with open(file_path, "wb") as buffer:
    while chunk := file.file.read(8192):
        sha256.update(chunk)  # Hash during upload
        buffer.write(chunk)   # Write to disk
file_hash = sha256.hexdigest()
Benefits:
  • Zero performance penalty (single pass)
  • Forensic integrity (tamper detection)
  • Audit trail (original filename + size + hash)
  • Court admissibility (cryptographic verification)

Case Management

When case_id is provided, the file is registered in the case database (DuckDB):
INSERT INTO case_files (
  case_id, phase_id, original_filename, processed_filename,
  sha256, file_size, file_category, row_count, upload_timestamp
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
Schema:
  • case_id: Investigation identifier
  • phase_id: Analysis phase (triage, deep_dive, reporting)
  • sha256: Chain of custody hash
  • file_category: forensic or generic
  • row_count: Number of timeline events
Multi-file correlation (cross-source pivoting) will use case_id in future releases.

Error Handling

Memory Exhaustion

If normalization fails due to OOM:
  1. Raw file is copied to output directory
  2. Response includes processed_records: "Unknown"
  3. File is still queryable (lazy evaluation)

Parsing Failures

If artifact parsing fails:
  1. Error logged to console
  2. Raw file copied as fallback
  3. HTTP 500 returned with error details

Unsupported Formats

If file extension is not recognized:
{
  "error": "Unsupported file format: .docx"
}

Performance Benchmarks

File SizeFormatUpload TimePeak RAMProcessed Records
6.0 GBEVTX4m 12s210 MB850,000
2.5 GBMFT1m 45s180 MB1,200,000
500 MBCSV22s90 MB2,000,000
1.2 GBJSON38s150 MB450,000
Tested on Apple M4 Pro with 48GB RAM

Next Steps

After uploading a file:
  1. Query data: Use /api/data/{csv_filename} endpoint
  2. Generate histogram: Call /api/histogram/{csv_filename}
  3. Run forensic report: POST to /api/forensic_report
  4. Export results: Use /api/export_filtered for CSV/XLSX/JSON

Analysis Endpoints

Query timelines, generate histograms, and run forensic analysis

Build docs developers (and LLMs) love