Data Ingestion & Cleaning

The first step of the analysis pipeline prepares raw surgical trajectory data for numerical analysis by converting it into a structured pandas DataFrame with computed time deltas.

Function Signature

def _paso1_ingestar_y_limpiar(trajectory_data: Dict) -> pd.DataFrame:
    """
    Converts raw trajectory JSON into a clean pandas DataFrame.
    Adds relative timestamps and time deltas.
    """

Input Format

The function expects a dictionary with a movements key containing a list of movement objects:

{
  "movements": [
    {
      "coordinates": [10.5, 20.3, 15.7],
      "event": "START",
      "timestamp": 1710523456789
    },
    {
      "coordinates": [10.6, 20.4, 15.8],
      "event": "NONE",
      "timestamp": 1710523456889
    }
  ]
}

Implementation

analysis_pipeline.py

def _paso1_ingestar_y_limpiar(trajectory_data: Dict) -> pd.DataFrame:
    movements = trajectory_data["movements"]
    df = pd.DataFrame([{
        "x": m["coordinates"][0],
        "y": m["coordinates"][1],
        "z": m["coordinates"][2] if len(m["coordinates"]) > 2 else 0,
        "event": m["event"],
        "timestamp": m["timestamp"]
    } for m in movements])
    
    df = df.sort_values("timestamp").reset_index(drop=True)
    # Tiempo relativo en segundos
    df["t"] = (df["timestamp"] - df["timestamp"].iloc[0]) / 1000.0
    df["dt"] = df["t"].diff().fillna(0)
    
    return df

Processing Steps

Extract Coordinates

Each movement’s coordinates are unpacked into separate x, y, z columns.

If a movement has only 2D coordinates, z defaults to 0 for 3D consistency.

Sort by Timestamp

Movements are sorted chronologically to ensure proper time-series analysis.

df = df.sort_values("timestamp").reset_index(drop=True)

Calculate Relative Time

Absolute timestamps are converted to relative seconds from procedure start.

# Convert milliseconds to seconds, relative to first timestamp
df["t"] = (df["timestamp"] - df["timestamp"].iloc[0]) / 1000.0

Compute Time Deltas

The time difference (dt) between consecutive movements is calculated.

# dt = time elapsed since previous movement
df["dt"] = df["t"].diff().fillna(0)

The first movement has dt=0 since there’s no previous movement.

Output DataFrame Structure

The resulting DataFrame has this structure:

Column	Type	Description
`x`	float	X coordinate in 3D space
`y`	float	Y coordinate in 3D space
`z`	float	Z coordinate (0 if not provided)
`event`	string	Surgical event type (START, NONE, TUMOR_TOUCH, HEMORRHAGE, FINISH)
`timestamp`	int	Original Unix timestamp in milliseconds
`t`	float	Relative time in seconds from start
`dt`	float	Time delta since previous movement

Example DataFrame

   x     y     z    event        timestamp     t      dt
10.5  20.3  15.7  START       1710523456789  0.000  0.000
10.6  20.4  15.8  NONE        1710523456889  0.100  0.100
10.8  20.6  15.9  NONE        1710523457039  0.250  0.150
11.2  21.0  16.1  TUMOR_TOUCH 1710523457239  0.450  0.200
11.5  21.3  16.2  FINISH      1710523457489  0.700  0.250

Why This Step Matters

Enables Time-Based Calculations

The dt column is essential for calculating velocity (distance/time) and acceleration (change in velocity/time).

Ensures Chronological Order

Sorting guarantees that subsequent steps process movements in the correct sequence.

Standardizes Data Format

Converting to DataFrame allows use of powerful pandas operations like vectorized math and grouping.

Handles Missing Data

The z-coordinate default and fillna(0) for dt prevent NaN propagation in later calculations.

Edge Cases

If the movements array is empty, subsequent pipeline steps will fail. Always validate that trajectory data contains at least 2 movements before calling the pipeline.

if len(trajectory_data["movements"]) < 2:
    raise ValueError("Trajectory must contain at least 2 movements")

Performance

This step is highly efficient:

List comprehension: Faster than iterative append
Vectorized operations: diff() and division operate on entire columns at once
Single sort: O(n log n) complexity, performed only once

Typical performance: ~10ms for 5000 movements

Next Step

Once the data is cleaned and structured, it proceeds to Step 2:

Dexterity Metrics

Calculate velocity, acceleration, jerk, and economy of movement

Overview

Components

Integration

Data Ingestion & Cleaning

Function Signature

Input Format

Implementation

Processing Steps

Output DataFrame Structure

Example DataFrame

Why This Step Matters

Edge Cases

Performance

Next Step

Dexterity Metrics

Build docs developers (and LLMs) love

Overview

Components

Integration

​Function Signature

​Input Format

​Implementation

​Processing Steps

​Output DataFrame Structure

​Example DataFrame

​Why This Step Matters

​Edge Cases

​Performance

​Next Step

Dexterity Metrics

Build docs developers (and LLMs) love

Function Signature

Input Format

Implementation

Processing Steps

Output DataFrame Structure

Example DataFrame

Why This Step Matters

Edge Cases

Performance

Next Step