A typical mid-to-large bank has 150-250 KPIs to watch: call-centre response time, fraud notification rate, card activation speed, credit approval latency, mobile-app error rate, ATM cash-fill ratio, and on. Every one of those KPIs needs a "down", "spike" or "drift" alert delivered to the right person at the right time. Wrong alerts exhaust the team; late alerts cost money.
Why the two pure approaches fail
Pure threshold: alert when KPI < X or > Y. Fast, explainable; but blind to seasonality and trend. The pre-holiday card-usage spike fires "anomaly" every year at the same hour.
Pure ML: time-series anomaly detection (Prophet, isolation forest, LSTM). Captures seasonality but is hard to explain, suffers a cold-start on new KPIs, and the model itself needs maintenance.
CentraQL combines both.
The hybrid model
for kpi in active_kpis: val = current_value(kpi) if hard_threshold_violated(kpi, val): emit(severity="critical", reason="hard threshold") continue z = rolling_z_score(kpi, window=28d) if abs(z) > kpi.z_threshold: baseline_pred = seasonal_baseline(kpi) if outside_band(val, baseline_pred, kpi.band): emit(severity="warning", reason="z-score+seasonal")
Three layers: hard threshold, z-score, seasonal baseline.
1. Hard threshold (rule)
Contractual limits live here. Examples: credit-approval p95 latency > 5 s; hourly fraud-notification rate > 0.8%. The analyst writes the rule, the owner approves, an SLA is attached.
2. Z-score (statistical)
Mean and std are computed over a 28-day rolling window. If |z| of the current value exceeds the threshold (typically 3.0), a signal fires. Z-score is explainable: "3.4 sigma away from the 28-day mean" is a sentence every CFO understands.
3. Seasonal baseline (CentraQL specific)
Z-score alone misses holidays, weekends and the intraday rhythm. The seasonal baseline produces an 8-week mean band for the same hour-of-week. A signal fires only when z-score AND seasonal are both off; values that pass either test alone do not raise alarms. This typically cuts false positives by 3-5×.
Threshold configuration
Thresholds are layered — KPI-specific, profile-specific, domain-pack-specific overrides:
- System default: |z| > 3.0
- KPI
fraud_rate_hourlyoverride: |z| > 2.5 (more sensitive) - ComplianceProfile RegulatedFinance tightens: |z| > 2.0 + seasonal AND
This turns threshold debates into policy and avoids fighting per-KPI.
Anomaly explanation
When CentraQL fires, the Copilot pipeline produces the explanation via the narrator LLM: "At 14:00 the ATM cash-fill ratio was 38%; the last 8 weeks at this hour averaged 52% ± 4. Signal z=-3.6, outside the seasonal band. Last Tuesday at 14:00 the value was 50%." The explanation is written to PromptAuditLog; the team does not have to chase the cause of the alert.
Operational result
In a bank pilot covering 180 KPIs over a month:
- Pure threshold: 240 alerts, 72% false positives.
- Pure z-score: 380 alerts, 58% false positives.
- CentraQL hybrid: 110 alerts, 18% false positives.
Fewer false positives = less alert fatigue = ~4× faster reaction time on real incidents.
Conclusion
KPI anomaly detection is neither a pure-rules problem nor a pure-ML problem. Hard thresholds enforce the contract, z-score catches deviation, the seasonal baseline catches rhythm. CentraQL fuses the three, has the Copilot narrate every alert, and writes the result to audit. To start with the 200-KPI watchlist of a bank: ~1 day to add the domain pack, ~3 days of owner tuning, then automatic.
