Real-Time Fraud Scoring in Banking — Architecture

When a card is swiped at a POS terminal, the customer waits a second or two at most. Within that window the transaction passes through dozens of steps — the network, the card scheme, core banking and fraud control. The budget left for the fraud-scoring service is usually under 100 milliseconds. This article covers how those 100 ms are spent and how the architecture is built.

The latency budget

A typical budget breakdown for real-time fraud scoring:

Network + deserialize: 10 ms
Feature lookup (feature store): 20-30 ms
Model inference: 20-40 ms
Rule engine + decision: 10 ms
Logging + response: 10 ms

The target is ~80 ms total; p99 must not exceed 100 ms. A budget overrun pushes the transaction from synchronous decline to asynchronous monitoring — which lowers the fraud-catch rate.

A two-tier decision: rules + model

A mature fraud system is not ML alone:

Rule engine (deterministic): blacklists, country/limit rules, velocity (5 transactions in the last minute) — fast, explainable checks. Microseconds.
ML model (probabilistic): gradient boosting (XGBoost/LightGBM) or a compact neural net. Learns from past behaviour and catches what rules miss.

The decision is the combination: if a rule says hard-decline the model never runs; otherwise the model score is combined with a threshold.

Feature store: hot and cold

The features a fraud model needs arrive at two speeds:

Hot (online): real-time counters like 'transactions in the last minute' or 'total amount in the last 5 minutes'. Held in Redis or a similar low-latency store.
Cold (offline): batch-computed values like 'the customer's 90-day average spend' or 'typical transaction hour'. Refreshed daily/hourly.

Feature stores like Feast generate both tiers from one definition, so training and serving use the same feature logic (avoiding train-serve skew).

The stream backbone

Transaction events flow over Kafka (or similar). A typical pipeline:

The transaction event lands on a Kafka topic.
A stream processor (Flink/Kafka Streams) updates the hot features.
The scoring service, on the synchronous call, reads online features, runs the model and returns a score.
The decision and score are written to an audit topic; downstream reporting and model retraining feed from there.

The model must stay fresh

Fraud patterns shift within weeks. The architecture needs two loops:

Online: the score returns instantly.
Offline: labelled outcomes (chargebacks, manual reviews) are collected; the model is retrained weekly/monthly; a champion-challenger setup shadow-tests a new model and promotes it if it wins.

Explainability and compliance

BDDK and the EU AI Act demand explainability for high-impact decisions like fraud. For each decline, the model's top 3-5 features (with SHAP values) must be logged. A rationale like 'amount is 8× higher than normal and from an unusual country' answers both the audit and the customer dispute.

Typical outcome

In the field with a rule+model hybrid: fraud capture rises 30-45% versus a rules-only system, while false positives (declining a genuine customer) drop 20-30% with proper threshold calibration. False positives are direct customer dissatisfaction and lost revenue, so threshold calibration matters as much as the model.

Conclusion

Real-time fraud scoring is not a model problem; it is a latency-budget problem. Feature lookup, model inference and the rule decision must fit inside 100 ms; the model must stay fresh; every decision must be explainable. Built correctly, both fraud loss and false declines fall.

Real-time Core Banking Apache Kafka Machine Learning

Real-Time Fraud Scoring in Banking: Latency Budget and Architecture

The latency budget

A two-tier decision: rules + model

Feature store: hot and cold

The stream backbone

The model must stay fresh

Explainability and compliance

Typical outcome

Conclusion

Data Contracts: Tying Pipeline Reliability to an SLA

CentraQL LoRA Fine-Tune: Adapting to Banking Language in 4 Weeks

CentraQL KPI Anomaly Detection: A Threshold + Z-Score Hybrid

Real-Time Fraud Scoring in Banking: Latency Budget and Architecture

The latency budget

A two-tier decision: rules + model

Feature store: hot and cold

The stream backbone

The model must stay fresh

Explainability and compliance

Typical outcome

Conclusion

Related posts

Data Contracts: Tying Pipeline Reliability to an SLA

CentraQL LoRA Fine-Tune: Adapting to Banking Language in 4 Weeks

CentraQL KPI Anomaly Detection: A Threshold + Z-Score Hybrid