The first step of the analysis pipeline prepares raw surgical trajectory data for numerical analysis by converting it into a structured pandas DataFrame with computed time deltas.
Function Signature
def _paso1_ingestar_y_limpiar ( trajectory_data : Dict) -> pd.DataFrame:
"""
Converts raw trajectory JSON into a clean pandas DataFrame.
Adds relative timestamps and time deltas.
"""
The function expects a dictionary with a movements key containing a list of movement objects:
{
"movements" : [
{
"coordinates" : [ 10.5 , 20.3 , 15.7 ],
"event" : "START" ,
"timestamp" : 1710523456789
},
{
"coordinates" : [ 10.6 , 20.4 , 15.8 ],
"event" : "NONE" ,
"timestamp" : 1710523456889
}
]
}
Implementation
def _paso1_ingestar_y_limpiar ( trajectory_data : Dict) -> pd.DataFrame:
movements = trajectory_data[ "movements" ]
df = pd.DataFrame([{
"x" : m[ "coordinates" ][ 0 ],
"y" : m[ "coordinates" ][ 1 ],
"z" : m[ "coordinates" ][ 2 ] if len (m[ "coordinates" ]) > 2 else 0 ,
"event" : m[ "event" ],
"timestamp" : m[ "timestamp" ]
} for m in movements])
df = df.sort_values( "timestamp" ).reset_index( drop = True )
# Tiempo relativo en segundos
df[ "t" ] = (df[ "timestamp" ] - df[ "timestamp" ].iloc[ 0 ]) / 1000.0
df[ "dt" ] = df[ "t" ].diff().fillna( 0 )
return df
Processing Steps
Extract Coordinates
Each movement’s coordinates are unpacked into separate x, y, z columns. If a movement has only 2D coordinates, z defaults to 0 for 3D consistency.
Sort by Timestamp
Movements are sorted chronologically to ensure proper time-series analysis. df = df.sort_values( "timestamp" ).reset_index( drop = True )
Calculate Relative Time
Absolute timestamps are converted to relative seconds from procedure start. # Convert milliseconds to seconds, relative to first timestamp
df[ "t" ] = (df[ "timestamp" ] - df[ "timestamp" ].iloc[ 0 ]) / 1000.0
Compute Time Deltas
The time difference (dt) between consecutive movements is calculated. # dt = time elapsed since previous movement
df[ "dt" ] = df[ "t" ].diff().fillna( 0 )
The first movement has dt=0 since there’s no previous movement.
Output DataFrame Structure
The resulting DataFrame has this structure:
Column Type Description xfloat X coordinate in 3D space yfloat Y coordinate in 3D space zfloat Z coordinate (0 if not provided) eventstring Surgical event type (START, NONE, TUMOR_TOUCH, HEMORRHAGE, FINISH) timestampint Original Unix timestamp in milliseconds tfloat Relative time in seconds from start dtfloat Time delta since previous movement
Example DataFrame
x y z event timestamp t dt
0 10.5 20.3 15.7 START 1710523456789 0.000 0.000
1 10.6 20.4 15.8 NONE 1710523456889 0.100 0.100
2 10.8 20.6 15.9 NONE 1710523457039 0.250 0.150
3 11.2 21.0 16.1 TUMOR_TOUCH 1710523457239 0.450 0.200
4 11.5 21.3 16.2 FINISH 1710523457489 0.700 0.250
Why This Step Matters
Enables Time-Based Calculations
The dt column is essential for calculating velocity (distance/time) and acceleration (change in velocity/time).
Ensures Chronological Order
Sorting guarantees that subsequent steps process movements in the correct sequence.
Converting to DataFrame allows use of powerful pandas operations like vectorized math and grouping.
The z-coordinate default and fillna(0) for dt prevent NaN propagation in later calculations.
Edge Cases
If the movements array is empty, subsequent pipeline steps will fail. Always validate that trajectory data contains at least 2 movements before calling the pipeline.
if len (trajectory_data[ "movements" ]) < 2 :
raise ValueError ( "Trajectory must contain at least 2 movements" )
This step is highly efficient:
List comprehension : Faster than iterative append
Vectorized operations : diff() and division operate on entire columns at once
Single sort : O(n log n) complexity, performed only once
Typical performance: ~10ms for 5000 movements
Next Step
Once the data is cleaned and structured, it proceeds to Step 2:
Dexterity Metrics Calculate velocity, acceleration, jerk, and economy of movement