Detecting fraud rings by modeling 590K transactions as a heterogeneous graph.
GNN + LightGBM stacking ensemble with Louvain community detection.
Only 3.5% of transactions are fraudulent. A naive classifier predicting "legit" for everything achieves 96.5% accuracy — making standard accuracy a useless metric. We need PR-AUC and F1 at an optimised threshold.
Fraudsters reuse devices, email domains, and billing addresses across multiple stolen cards. Traditional tabular models treat each transaction independently — they miss the network structure that exposes organised rings.
Random train/test splits cause data leakage — future transaction patterns inform past predictions. Real fraud detection must use strict temporal splits: train on early transactions, test on later ones.
Real-world transaction data provided by Vesta Corporation for the IEEE Computational Intelligence Society (CIS) Fraud Detection competition. One of the most feature-rich public fraud datasets available.
Sorted by TransactionDT before splitting. Earlier transactions train the model; later ones test it — mirroring real deployment where the model never sees the future.
Fraudsters don't act alone — they reuse infrastructure. Two stolen cards sharing the same device fingerprint or email domain is a signal invisible to per-transaction models but obvious in a graph.
We build a graph where transaction nodes connect to entity nodes (cards, addresses, email domains, devices) via typed edges. Two cards sharing a device become graph neighbors — the GNN propagates fraud signals across this structure.
GAT learns attention weights per neighbor — not all connections are equally suspicious. A card sharing an email with 2 other cards is more suspicious than sharing with 2,000. Attention captures this asymmetry automatically.
GBDT excels at tabular data with mixed feature types. We use it as a strong standalone model and as a stacking input to the ensemble — it captures feature-level patterns the GNN may miss from the 428-feature engineered set.
GNN scores (graph structure) + LightGBM scores (feature patterns) → Logistic Regression meta-learner. Each model brings orthogonal signal — the ensemble consistently outperforms either alone on imbalanced fraud data.
is_unbalance=True.TransactionDT: 70% train → 15% val → 15% test, strictly ordered. Card-level aggregations computed on train set only and applied to val/test — no future information bleeds backward.Two metrics matter most for imbalanced fraud detection: ROC-AUC (overall discrimination) and PR-AUC (precision-recall trade-off at minority class — stricter and more informative at 3.5% fraud rate).
Threshold optimised by F1 score on test set (not default 0.5 — default is wrong for imbalanced data).
SHAP (SHapley Additive exPlanations) reveals why the model predicts fraud for a specific transaction. Critical for regulatory compliance in financial services.
After training the GNN, we extract node embeddings and run Louvain community detection on the card co-occurrence graph. Communities with fraud rates significantly above the 3.5% baseline are flagged as fraud rings.
Each node = a card. Edges = shared email domain, address, or device type. Node size scales with fraud rate. Hover for details.
Submit a transaction to the trained ensemble endpoint. The model returns fraud probability from both LightGBM and GNN components.
Submit a transaction to see the fraud probability score.
Fraud patterns evolve — fraudsters adapt when a technique gets blocked. Adding ADWIN/DDM drift monitoring would flag when model performance degrades on recent transactions, triggering retraining.
Current GAT treats the graph as static. A temporal variant would weight recent connections more heavily, capturing that two cards sharing a device yesterday is more suspicious than sharing one last year.
Currently using 4 entity types (card, address, email, device). Adding merchant IDs, IP addresses, and transaction amounts as edge weights would significantly enrich the graph signal for ring detection.