Skip to main content
The first step of the analysis pipeline prepares raw surgical trajectory data for numerical analysis by converting it into a structured pandas DataFrame with computed time deltas.

Function Signature

def _paso1_ingestar_y_limpiar(trajectory_data: Dict) -> pd.DataFrame:
    """
    Converts raw trajectory JSON into a clean pandas DataFrame.
    Adds relative timestamps and time deltas.
    """

Input Format

The function expects a dictionary with a movements key containing a list of movement objects:
{
  "movements": [
    {
      "coordinates": [10.5, 20.3, 15.7],
      "event": "START",
      "timestamp": 1710523456789
    },
    {
      "coordinates": [10.6, 20.4, 15.8],
      "event": "NONE",
      "timestamp": 1710523456889
    }
  ]
}

Implementation

analysis_pipeline.py
def _paso1_ingestar_y_limpiar(trajectory_data: Dict) -> pd.DataFrame:
    movements = trajectory_data["movements"]
    df = pd.DataFrame([{
        "x": m["coordinates"][0],
        "y": m["coordinates"][1],
        "z": m["coordinates"][2] if len(m["coordinates"]) > 2 else 0,
        "event": m["event"],
        "timestamp": m["timestamp"]
    } for m in movements])
    
    df = df.sort_values("timestamp").reset_index(drop=True)
    # Tiempo relativo en segundos
    df["t"] = (df["timestamp"] - df["timestamp"].iloc[0]) / 1000.0
    df["dt"] = df["t"].diff().fillna(0)
    
    return df

Processing Steps

1

Extract Coordinates

Each movement’s coordinates are unpacked into separate x, y, z columns.
If a movement has only 2D coordinates, z defaults to 0 for 3D consistency.
2

Sort by Timestamp

Movements are sorted chronologically to ensure proper time-series analysis.
df = df.sort_values("timestamp").reset_index(drop=True)
3

Calculate Relative Time

Absolute timestamps are converted to relative seconds from procedure start.
# Convert milliseconds to seconds, relative to first timestamp
df["t"] = (df["timestamp"] - df["timestamp"].iloc[0]) / 1000.0
4

Compute Time Deltas

The time difference (dt) between consecutive movements is calculated.
# dt = time elapsed since previous movement
df["dt"] = df["t"].diff().fillna(0)
The first movement has dt=0 since there’s no previous movement.

Output DataFrame Structure

The resulting DataFrame has this structure:
ColumnTypeDescription
xfloatX coordinate in 3D space
yfloatY coordinate in 3D space
zfloatZ coordinate (0 if not provided)
eventstringSurgical event type (START, NONE, TUMOR_TOUCH, HEMORRHAGE, FINISH)
timestampintOriginal Unix timestamp in milliseconds
tfloatRelative time in seconds from start
dtfloatTime delta since previous movement

Example DataFrame

   x     y     z    event        timestamp     t      dt
0  10.5  20.3  15.7  START       1710523456789  0.000  0.000
1  10.6  20.4  15.8  NONE        1710523456889  0.100  0.100
2  10.8  20.6  15.9  NONE        1710523457039  0.250  0.150
3  11.2  21.0  16.1  TUMOR_TOUCH 1710523457239  0.450  0.200
4  11.5  21.3  16.2  FINISH      1710523457489  0.700  0.250

Why This Step Matters

The dt column is essential for calculating velocity (distance/time) and acceleration (change in velocity/time).
Sorting guarantees that subsequent steps process movements in the correct sequence.
Converting to DataFrame allows use of powerful pandas operations like vectorized math and grouping.
The z-coordinate default and fillna(0) for dt prevent NaN propagation in later calculations.

Edge Cases

If the movements array is empty, subsequent pipeline steps will fail. Always validate that trajectory data contains at least 2 movements before calling the pipeline.
if len(trajectory_data["movements"]) < 2:
    raise ValueError("Trajectory must contain at least 2 movements")

Performance

This step is highly efficient:
  • List comprehension: Faster than iterative append
  • Vectorized operations: diff() and division operate on entire columns at once
  • Single sort: O(n log n) complexity, performed only once
Typical performance: ~10ms for 5000 movements

Next Step

Once the data is cleaned and structured, it proceeds to Step 2:

Dexterity Metrics

Calculate velocity, acceleration, jerk, and economy of movement

Build docs developers (and LLMs) love