When a card is swiped at a POS terminal, the customer waits a second or two at most. Within that window the transaction passes through dozens of steps — the network, the card scheme, core banking and fraud control. The budget left for the fraud-scoring service is usually under 100 milliseconds. This article covers how those 100 ms are spent and how the architecture is built.
The latency budget
A typical budget breakdown for real-time fraud scoring:
- Network + deserialize: 10 ms
- Feature lookup (feature store): 20-30 ms
- Model inference: 20-40 ms
- Rule engine + decision: 10 ms
- Logging + response: 10 ms
The target is ~80 ms total; p99 must not exceed 100 ms. A budget overrun pushes the transaction from synchronous decline to asynchronous monitoring — which lowers the fraud-catch rate.
A two-tier decision: rules + model
A mature fraud system is not ML alone:
- Rule engine (deterministic): blacklists, country/limit rules, velocity (5 transactions in the last minute) — fast, explainable checks. Microseconds.
- ML model (probabilistic): gradient boosting (XGBoost/LightGBM) or a compact neural net. Learns from past behaviour and catches what rules miss.
The decision is the combination: if a rule says hard-decline the model never runs; otherwise the model score is combined with a threshold.
Feature store: hot and cold
The features a fraud model needs arrive at two speeds:
- Hot (online): real-time counters like 'transactions in the last minute' or 'total amount in the last 5 minutes'. Held in Redis or a similar low-latency store.
- Cold (offline): batch-computed values like 'the customer's 90-day average spend' or 'typical transaction hour'. Refreshed daily/hourly.
Feature stores like Feast generate both tiers from one definition, so training and serving use the same feature logic (avoiding train-serve skew).
The stream backbone
Transaction events flow over Kafka (or similar). A typical pipeline:
- The transaction event lands on a Kafka topic.
- A stream processor (Flink/Kafka Streams) updates the hot features.
- The scoring service, on the synchronous call, reads online features, runs the model and returns a score.
- The decision and score are written to an audit topic; downstream reporting and model retraining feed from there.
The model must stay fresh
Fraud patterns shift within weeks. The architecture needs two loops:
- Online: the score returns instantly.
- Offline: labelled outcomes (chargebacks, manual reviews) are collected; the model is retrained weekly/monthly; a champion-challenger setup shadow-tests a new model and promotes it if it wins.
Explainability and compliance
BDDK and the EU AI Act demand explainability for high-impact decisions like fraud. For each decline, the model's top 3-5 features (with SHAP values) must be logged. A rationale like 'amount is 8× higher than normal and from an unusual country' answers both the audit and the customer dispute.
Typical outcome
In the field with a rule+model hybrid: fraud capture rises 30-45% versus a rules-only system, while false positives (declining a genuine customer) drop 20-30% with proper threshold calibration. False positives are direct customer dissatisfaction and lost revenue, so threshold calibration matters as much as the model.
Conclusion
Real-time fraud scoring is not a model problem; it is a latency-budget problem. Feature lookup, model inference and the rule decision must fit inside 100 ms; the model must stay fresh; every decision must be explainable. Built correctly, both fraud loss and false declines fall.
