Banking & Finance

Real-Time Fraud Scoring in Banking: Latency Budget and Architecture

Before a card transaction is approved, the fraud score must come back in 100 milliseconds. Here is how that budget is spent and how the architecture is built.

BIART Ekibi3 min read2 views
Gerçek zamanlı fraud skorlama mimarisi görseli

When a card is swiped at a POS terminal, the customer waits a second or two at most. Within that window the transaction passes through dozens of steps — the network, the card scheme, core banking and fraud control. The budget left for the fraud-scoring service is usually under 100 milliseconds. This article covers how those 100 ms are spent and how the architecture is built.

The latency budget

A typical budget breakdown for real-time fraud scoring:

  • Network + deserialize: 10 ms
  • Feature lookup (feature store): 20-30 ms
  • Model inference: 20-40 ms
  • Rule engine + decision: 10 ms
  • Logging + response: 10 ms

The target is ~80 ms total; p99 must not exceed 100 ms. A budget overrun pushes the transaction from synchronous decline to asynchronous monitoring — which lowers the fraud-catch rate.

A two-tier decision: rules + model

A mature fraud system is not ML alone:

  1. Rule engine (deterministic): blacklists, country/limit rules, velocity (5 transactions in the last minute) — fast, explainable checks. Microseconds.
  2. ML model (probabilistic): gradient boosting (XGBoost/LightGBM) or a compact neural net. Learns from past behaviour and catches what rules miss.

The decision is the combination: if a rule says hard-decline the model never runs; otherwise the model score is combined with a threshold.

Feature store: hot and cold

The features a fraud model needs arrive at two speeds:

  • Hot (online): real-time counters like 'transactions in the last minute' or 'total amount in the last 5 minutes'. Held in Redis or a similar low-latency store.
  • Cold (offline): batch-computed values like 'the customer's 90-day average spend' or 'typical transaction hour'. Refreshed daily/hourly.

Feature stores like Feast generate both tiers from one definition, so training and serving use the same feature logic (avoiding train-serve skew).

The stream backbone

Transaction events flow over Kafka (or similar). A typical pipeline:

  1. The transaction event lands on a Kafka topic.
  2. A stream processor (Flink/Kafka Streams) updates the hot features.
  3. The scoring service, on the synchronous call, reads online features, runs the model and returns a score.
  4. The decision and score are written to an audit topic; downstream reporting and model retraining feed from there.

The model must stay fresh

Fraud patterns shift within weeks. The architecture needs two loops:

  • Online: the score returns instantly.
  • Offline: labelled outcomes (chargebacks, manual reviews) are collected; the model is retrained weekly/monthly; a champion-challenger setup shadow-tests a new model and promotes it if it wins.

Explainability and compliance

BDDK and the EU AI Act demand explainability for high-impact decisions like fraud. For each decline, the model's top 3-5 features (with SHAP values) must be logged. A rationale like 'amount is 8× higher than normal and from an unusual country' answers both the audit and the customer dispute.

Typical outcome

In the field with a rule+model hybrid: fraud capture rises 30-45% versus a rules-only system, while false positives (declining a genuine customer) drop 20-30% with proper threshold calibration. False positives are direct customer dissatisfaction and lost revenue, so threshold calibration matters as much as the model.

Conclusion

Real-time fraud scoring is not a model problem; it is a latency-budget problem. Feature lookup, model inference and the rule decision must fit inside 100 ms; the model must stay fresh; every decision must be explainable. Built correctly, both fraud loss and false declines fall.

Share