Skip to main content

OutlierDetector

Z-score based anomaly detector that identifies outliers using standardized distances from the mean.

Constructor

OutlierDetector(random_state: int = 42)
random_state
int
default:"42"
Random seed for reproducibility (currently unused but provided for consistency)

Attributes

center
np.ndarray | None
Mean values for each feature, computed during fit
scale
np.ndarray | None
Standard deviations for each feature (with 1e-6 added for numerical stability), computed during fit

Methods

fit

Fit the detector by computing feature means and standard deviations.
fit(X: pd.DataFrame) -> OutlierDetector
X
pd.DataFrame
required
Training data with samples as rows and features as columns
return
OutlierDetector
Returns self for method chaining

score_samples

Compute anomaly scores for samples.
score_samples(X: pd.DataFrame) -> pd.Series
X
pd.DataFrame
required
Data to score. Must have same features as training data.
return
pd.Series
Series of anomaly scores (mean absolute z-score across features). Higher scores indicate more anomalous samples.

detect

Detect anomalies based on quantile threshold.
detect(X: pd.DataFrame, threshold_quantile: float = 0.9) -> pd.DataFrame
X
pd.DataFrame
required
Data to analyze for anomalies
threshold_quantile
float
default:"0.9"
Quantile threshold for anomaly detection. Scores at or above this quantile are flagged as anomalies.
return
pd.DataFrame
DataFrame with columns:
  • anomaly_score: Numeric anomaly scores (float)
  • is_anomaly: Boolean flag indicating if score exceeds threshold (bool)

simulate_early_warning

Simulate an early warning system by counting alerts and measuring latency.
simulate_early_warning(
    scores: pd.Series,
    timestamps: pd.DatetimeIndex,
    threshold: float
) -> dict[str, float]
scores
pd.Series
required
Time-ordered anomaly scores
timestamps
pd.DatetimeIndex
required
Timestamps corresponding to each score (must be same length as scores)
threshold
float
required
Score threshold for triggering alerts
return
dict[str, float]
Dictionary containing:
  • alert_count: Total number of alerts triggered (float)
  • first_alert_latency_s: Seconds from start to first alert (float, inf if no alerts)

evaluate_detection_latency

Evaluate detection latency by measuring time from ground truth event to first alert.
evaluate_detection_latency(
    scores: pd.Series,
    ground_truth_events: pd.Series,
    timestamps: pd.DatetimeIndex
) -> float
scores
pd.Series
required
Anomaly scores for each time point
ground_truth_events
pd.Series
required
Binary series (0 or 1) indicating when true events occurred
timestamps
pd.DatetimeIndex
required
Timestamps for each observation (must be same length as scores and events)
return
float
Latency in seconds from first ground truth event to first alert. Returns:
  • Positive float: Seconds between first event and first subsequent alert
  • nan: No ground truth events found
  • inf: No alerts triggered after the first event

Build docs developers (and LLMs) love