Skip to main content
POST
/
api
/
data
/
upload
Upload CSV
curl --request POST \
  --url https://api.example.com/api/data/upload \
  --header 'Content-Type: application/json' \
  --data '{}'
{
  "upload_id": "<string>",
  "row_count": 123,
  "headers": [
    {}
  ],
  "sample_rows": [
    {}
  ],
  "detected_dataset": "<string>",
  "suggested_mapping": {},
  "mapping_confidence": {},
  "temporal_scale": 123,
  "clarification_questions": [
    {}
  ],
  "metadata": {},
  "error": "<string>",
  "message": "<string>",
  "details": [
    {}
  ]
}
This endpoint handles CSV file uploads, parses the data, detects the dataset type, and suggests column mappings to standard fields. It supports automatic detection for IBM AML, PaySim, and generic compliance datasets.

Request

file
file
required
CSV file to upload. Must be a valid CSV with headers in the first row.

Content Type

multipart/form-data

Example Request

curl -X POST https://your-domain.com/api/data/upload \
  -F "[email protected]"

Response

upload_id
string
required
Unique identifier for the uploaded dataset. Use this ID in subsequent API calls.
row_count
number
required
Total number of data rows in the uploaded CSV (excluding header).
headers
array
required
Array of column names extracted from the CSV header row.
sample_rows
array
required
First 5 rows of the dataset for preview purposes. Each row is an object with column names as keys.
detected_dataset
string
required
Detected dataset type. Possible values:
  • IBM_AML - IBM AML transaction dataset
  • PAYSIM - PaySim financial simulation dataset
  • GENERIC - Generic compliance dataset
suggested_mapping
object
required
Suggested column mappings from CSV headers to standard fields. Keys are standard field names:
  • account - Primary entity identifier (sender, account_id, user_id)
  • recipient - Secondary entity identifier (receiver, destination)
  • amount - Monetary value or transaction amount
  • step - Timestamp or time-step column
  • type - Category or classification column
Values are the corresponding CSV column names.
mapping_confidence
object
required
Confidence scores (0-100) for each suggested mapping. Keys match suggested_mapping keys.
  • Known datasets (IBM_AML, PAYSIM): 100% confidence
  • Generic datasets: AI-generated confidence based on column analysis
temporal_scale
number
required
Scaling factor for temporal data normalization:
  • 24.0 - IBM AML (converts days to hours)
  • 1.0 - PaySim and Generic (already in hours)
clarification_questions
array
required
Advisory clarification questions to improve mapping accuracy. Empty array in MVP.
metadata
object
Dataset statistics for internal verification:
  • totalRows - Total row count
  • columnStats - Per-column statistics (type, cardinality, min/max/mean for numeric columns)

Example Response

{
  "upload_id": "a3f12b45-8c7d-4e9f-b1a2-3c4d5e6f7g8h",
  "row_count": 1250,
  "headers": ["orig_acct", "bene_acct", "base_amt", "tran_timestamp", "tx_type"],
  "sample_rows": [
    {
      "orig_acct": "ACCT_001",
      "bene_acct": "ACCT_245",
      "base_amt": 5000.00,
      "tran_timestamp": 1609459200,
      "tx_type": "WIRE"
    },
    {
      "orig_acct": "ACCT_002",
      "bene_acct": "ACCT_356",
      "base_amt": 12500.50,
      "tran_timestamp": 1609545600,
      "tx_type": "ACH"
    }
  ],
  "detected_dataset": "IBM_AML",
  "suggested_mapping": {
    "account": "orig_acct",
    "recipient": "bene_acct",
    "amount": "base_amt",
    "step": "tran_timestamp",
    "type": "tx_type"
  },
  "mapping_confidence": {
    "account": 100,
    "recipient": 100,
    "amount": 100,
    "step": 100,
    "type": 100
  },
  "temporal_scale": 24.0,
  "clarification_questions": [],
  "metadata": {
    "totalRows": 1250,
    "columnStats": {
      "base_amt": {
        "type": "numeric",
        "cardinality": 892,
        "min": 100.00,
        "max": 50000.00,
        "mean": 8547.32
      },
      "tx_type": {
        "type": "categorical",
        "cardinality": 4
      }
    }
  }
}

Error Responses

error
string
Error code:
  • VALIDATION_ERROR - No file provided or CSV parsing failed
  • INTERNAL_ERROR - Unexpected server error
message
string
Human-readable error description.
details
array
Additional error details (e.g., CSV parsing errors).

Example Error Response

{
  "error": "VALIDATION_ERROR",
  "message": "Failed to parse CSV",
  "details": [
    {
      "type": "Delimiter",
      "code": "UndetectableDelimiter",
      "message": "Unable to auto-detect delimiter",
      "row": 0
    }
  ]
}

Dataset Detection Logic

IBM AML Detection

Triggered when at least 3 of these columns are present:
  • orig_acct
  • bene_acct
  • base_amt
  • tran_timestamp
  • tx_type

PaySim Detection

Triggered when at least 3 of these columns are present:
  • nameOrig
  • nameDest
  • step
  • isFraud

Generic Fallback

For unrecognized datasets, the system uses AI to suggest mappings based on:
  • Column name patterns
  • Sample data analysis
  • Common compliance data structures

Notes

  • CSV files are parsed with headers and dynamic typing enabled
  • Empty lines are automatically skipped
  • Upload data is stored in memory with a time-limited session
  • Maximum sample size for AI analysis: first 3 rows
  • AI mapping uses Gemini for generic datasets with graceful degradation

Build docs developers (and LLMs) love