Performance Scoring

Overview

Justina’s performance scoring system provides objective, quantifiable measurements of surgical skill. The algorithm combines error detection, movement efficiency analysis, and precision metrics to generate scores from 0-100 with actionable feedback.

The scoring system is penalty-based, starting from a perfect 100 and deducting points for errors and inefficiencies.

Scoring Algorithm

The core scoring logic balances multiple performance factors:

def _paso5_generar_feedback(m: Dict, b: Dict, r: Dict) -> Tuple[float, str]:
    score = 100.0
    score -= r["touches"] * 8
    score -= r["hemorrhages"] * 15
    if m["economia"] > 1.5: score -= 10
    score = max(0, min(100, score))

Base Score

Start: 100 points (perfect performance)

Error Penalties

Deductions for surgical mistakes

Efficiency Bonus

Maintain score with optimal movement

Penalty Structure

Critical Error Penalties

Hemorrhages: -15 points each

Most severe penalty - indicates vascular damage

score -= r["hemorrhages"] * 15

Triggered when scalpel cuts arterial structures
Each hemorrhage represents a life-threatening error
Multiple hemorrhages can drastically reduce score

Example Impact:

1 hemorrhage: 100 → 85 points
2 hemorrhages: 100 → 70 points
3 hemorrhages: 100 → 55 points (DEFICIENTE tier)

Tumor Touches: -8 points each

Moderate penalty - indicates imprecise cutting

score -= r["touches"] * 8

Counts contacts with tumor tissue during removal
More touches = less efficient tumor extraction
Ideal performance minimizes tissue manipulation

Example Impact:

3 touches: 100 → 76 points
5 touches: 100 → 60 points (MEJORABLE tier)
10 touches: 100 → 20 points

Poor Economy of Movement: -10 points

Efficiency penalty - triggered when path is excessively indirect

if m["economia"] > 1.5: score -= 10

Economy > 1.5x means 50% more distance traveled than necessary
Indicates poor surgical planning and fatigue risk
Single flat penalty rather than graduated

Threshold Rationale:

< 1.2x: Excellent efficiency
1.2x - 1.5x: Acceptable with room for improvement
1.5x: Unacceptable inefficiency (-10 penalty)

Performance Tiers

Scores are categorized into four performance levels:

status = (
    "🌟 EXCELENTE" if score >= 90 else
    "✅ BUENO" if score >= 75 else
    "⚠️ MEJORABLE" if score >= 60 else
    "❌ DEFICIENTE"
)

🌟 EXCELENTE (90-100)

Expert-level performance with minimal errors and optimal efficiency

✅ BUENO (75-89)

Proficient skill with acceptable precision and few critical errors

⚠️ MEJORABLE (60-74)

Needs improvement - multiple errors or poor efficiency

❌ DEFICIENTE (0-59)

Requires significant practice - critical errors or severe inefficiency

Factors Affecting Score

1. Hemorrhages (Critical)

Detected when the scalpel intersects arterial meshes:

if (!arteryCut && cutter.intersectsMesh(arteryMesh, true)) {
  arteryCut = true;
  enviarEvento(
    scalpelMesh?.position.x,
    scalpelMesh?.position.y,
    scalpelMesh?.position.z,
    "HEMORRHAGE"
  );
}

Risk Analysis:

hemorrhages = (df["event"] == "HEMORRHAGE").sum()

A single hemorrhage can drop a perfect score to 85. Two or more hemorrhages make it impossible to achieve “EXCELENTE” tier.

2. Tumor Touches (Moderate)

Counted when instrument contacts tumor fragments:

tumorFragments.forEach((fragment, index) => {
  if (cutter.intersectsMesh(fragment, true)) {
    fragment.dispose();
    tumorFragments.splice(index, 1);
    enviarEvento(
      scalpelMesh.position.x,
      scalpelMesh.position.y,
      scalpelMesh.position.z,
      "TUMOR_TOUCH"
    );
  }
});

Counting Logic:

tumor_touches = (df["event"] == "TUMOR_TOUCH").sum()

3. Economy of Movement (Efficiency)

Ratio of total path length to direct distance:

total_dist = dist.sum()
p1 = np.array([df["x"].iloc[0], df["y"].iloc[0], df["z"].iloc[0]])
p2 = np.array([df["x"].iloc[-1], df["y"].iloc[-1], df["z"].iloc[-1]])
direct_dist = np.linalg.norm(p2 - p1)
economia = total_dist / direct_dist if direct_dist > 0 else 1.0

Interpretation:

1.0x: Perfect efficiency (straight line)
1.2x: 20% longer than ideal (acceptable)
1.5x: 50% longer (penalty threshold)
2.0x: 100% longer (severe inefficiency)

Economy of movement is not continuously penalized - only when exceeding 1.5x threshold.

Scoring Examples

Example 1: Near-Perfect Performance

m = {"economia": 1.15, ...}  # Excellent efficiency
r = {"touches": 2, "hemorrhages": 0}  # Minimal errors

score = 100.0
score -= 2 * 8  # -16 for touches
score -= 0 * 15  # No hemorrhages
# economia < 1.5, no penalty

Final Score: 84 (✅ BUENO)

Example 2: Critical Errors

m = {"economia": 1.3, ...}
r = {"touches": 5, "hemorrhages": 2}

score = 100.0
score -= 5 * 8   # -40 for touches
score -= 2 * 15  # -30 for hemorrhages
# economia < 1.5, no penalty

Final Score: 30 (❌ DEFICIENTE)

Example 3: Poor Efficiency

m = {"economia": 1.82, ...}  # Inefficient path
r = {"touches": 3, "hemorrhages": 0}

score = 100.0
score -= 3 * 8   # -24 for touches
score -= 0 * 15  # No hemorrhages
score -= 10      # -10 for economia > 1.5

Final Score: 66 (⚠️ MEJORABLE)

Supporting Metrics

While not directly affecting the score, these metrics appear in feedback:

Precision vs Ideal

Percentage score based on deviation from straight-line path

Smoothness (Jerk)

Rate of acceleration change - indicator of hand steadiness

Duration

Total simulation time - context for efficiency

Average Velocity

Mean instrument speed - indicates confidence level

Precision Calculation

def _paso3_benchmarking(df: pd.DataFrame) -> Dict:
    p_start = np.array([df["x"].iloc[0], df["y"].iloc[0], df["z"].iloc[0]])
    p_end = np.array([df["x"].iloc[-1], df["y"].iloc[-1], df["z"].iloc[-1]])
    
    def dist_to_line(p, a, b):
        return np.linalg.norm(np.cross(b-a, a-p)) / np.linalg.norm(b-a)
    
    desviaciones = [dist_to_line(
        np.array([r.x, r.y, r.z]), p_start, p_end
    ) for r in df.itertuples()]
    
    return {
        "precision": max(0, 100 - np.mean(desviaciones) * 10)
    }

Precision < 70% triggers a specific recommendation in the feedback, even though it doesn’t directly affect the score.

Feedback Generation

The scoring system generates structured markdown feedback:

feedback = f"""### {status} - Score: {score:.1f}/100

#### 🚨 ALERTAS CRÍTICAS
- Hemorragias: {r["hemorrhages"]} {"(REVISAR TÉCNICA)" if r["hemorrhages"] > 0 else "(Ninguna)"}
- Contactos Tumor: {r["touches"]}
- Cuadrantes de Riesgo: {", ".join(r["cuadrantes"]) if r["cuadrantes"] else "Ninguno"}

#### 📊 MÉTRICAS DE DESTREZA
- **Economía de Movimiento:** {m["economia"]:.2f}x (Ideal < 1.2x)
- **Fluidez (Jerk Promedio):** {m["j_avg"]:.2f} 
- **Precisión vs Patrón Oro:** {b["precision"]:.1f}%

#### 📈 ESTADÍSTICAS
- **Duración Total:** {m["duration"]:.1f}s
- **Distancia Recorrida:** {m["total_dist"]:.2f} unidades
- **Velocidad Promedio:** {m["v_avg"]:.2f} u/s

#### 💡 RECOMENDACIONES
"""

Conditional Recommendations

Recommendations are added based on specific performance gaps:

if r["hemorrhages"] > 0:
    feedback += "- Priorizar control vascular en cuadrantes críticos.\n"
if m["economia"] > 1.8:
    feedback += "- Planificar trayectorias más directas para reducir fatiga.\n"
if b["precision"] < 70:
    feedback += "- Mantener mayor estabilidad en la ejecución del path ideal.\n"
if score < 80:
    feedback += "- Incrementar práctica en simulador para mejorar coordinación motora.\n"

Recommendation Triggers

Hemorrhages > 0: Vascular control advice
Economy > 1.8x: Path planning guidance
Precision < 70%: Stability improvement suggestion
Score < 80: General practice recommendation

Spatial Risk Analysis

The system identifies problematic surgical quadrants:

mid_x = (df["x"].max() + df["x"].min()) / 2
mid_y = (df["y"].max() + df["y"].min()) / 2

problemas = df[df["event"].isin(["TUMOR_TOUCH", "HEMORRHAGE"])]
cuadrantes_criticos = []
if not problemas.empty:
    for p in problemas.itertuples():
        pos = ""
        pos += "Sup" if p.y > mid_y else "Inf"
        pos += "-Der" if p.x > mid_x else "-Izq"
        if pos not in cuadrantes_criticos:
            cuadrantes_criticos.append(pos)

Quadrant Labels:

Sup-Der: Superior-Right
Sup-Izq: Superior-Left
Inf-Der: Inferior-Right
Inf-Izq: Inferior-Left

Quadrant analysis helps surgeons understand spatial patterns in their errors for targeted improvement.

Score Interpretation Guidelines

What does my score mean?

90-100 (EXCELENTE)

Maximum 1-2 tumor touches
Zero hemorrhages
Excellent path efficiency
Ready for advanced procedures

75-89 (BUENO)

2-3 tumor touches acceptable
At most 1 hemorrhage
Good efficiency with minor deviations
Competent for standard procedures

60-74 (MEJORABLE)

4-6 tumor touches
1-2 hemorrhages
Noticeable inefficiency
Requires focused practice

0-59 (DEFICIENTE)

Excessive errors (7+ touches or 3+ hemorrhages)
Very poor efficiency (>1.5x)
Fundamental skill gaps
Needs extensive training

Algorithmic Fairness

The scoring system is designed to be:

Objective

Based purely on measurable physical metrics and event counts

Reproducible

Same movements always produce same score

Transparent

All penalty calculations visible in code

Calibrated

Penalty weights tuned to surgical severity

Future Enhancements

Planned improvements to the scoring algorithm:

Weighted Penalties: Different penalties based on tumor size/location
Time Bonuses: Reward optimal completion times
Difficulty Scaling: Adjust scores based on procedure complexity
Percentile Rankings: Compare against cohort of similar experience levels
Trend Analysis: Track improvement over multiple sessions

Getting Started

Features

User Guide

​Overview

​Scoring Algorithm

Base Score

Error Penalties

Efficiency Bonus

​Penalty Structure

​Critical Error Penalties

​Performance Tiers

🌟 EXCELENTE (90-100)

✅ BUENO (75-89)

⚠️ MEJORABLE (60-74)

❌ DEFICIENTE (0-59)

​Factors Affecting Score

​1. Hemorrhages (Critical)

​2. Tumor Touches (Moderate)

​3. Economy of Movement (Efficiency)

​Scoring Examples

​Example 1: Near-Perfect Performance

​Example 2: Critical Errors

​Example 3: Poor Efficiency

​Supporting Metrics

Precision vs Ideal

Smoothness (Jerk)

Duration

Average Velocity

​Precision Calculation

​Feedback Generation

​Conditional Recommendations

​Spatial Risk Analysis

​Score Interpretation Guidelines

​Algorithmic Fairness

Objective

Reproducible

Transparent

Calibrated

​Future Enhancements

​Next Steps

AI Analysis

3D Simulation

Build docs developers (and LLMs) love

Overview

Scoring Algorithm

Penalty Structure

Critical Error Penalties

Performance Tiers

Factors Affecting Score

1. Hemorrhages (Critical)

2. Tumor Touches (Moderate)

3. Economy of Movement (Efficiency)

Scoring Examples

Example 1: Near-Perfect Performance

Example 2: Critical Errors

Example 3: Poor Efficiency

Supporting Metrics

Precision Calculation

Feedback Generation

Conditional Recommendations

Spatial Risk Analysis

Score Interpretation Guidelines

Algorithmic Fairness

Future Enhancements

Next Steps