PyTorch Geometric LightGBM Graph Attention Networks IEEE-CIS Dataset SHAP

Graph Neural Network
Credit Card Fraud Detection

Detecting fraud rings by modeling 590K transactions as a heterogeneous graph.
GNN + LightGBM stacking ensemble with Louvain community detection.

Ensemble AUC
PR-AUC
590K
Transactions
393
Features
Fraud Rings

Why Credit Card Fraud Is a Hard ML Problem

⚖️
Severe Class Imbalance

Only 3.5% of transactions are fraudulent. A naive classifier predicting "legit" for everything achieves 96.5% accuracy — making standard accuracy a useless metric. We need PR-AUC and F1 at an optimised threshold.

🕸️
Fraud Rings Are Relational

Fraudsters reuse devices, email domains, and billing addresses across multiple stolen cards. Traditional tabular models treat each transaction independently — they miss the network structure that exposes organised rings.

⏱️
Temporal Data Leakage

Random train/test splits cause data leakage — future transaction patterns inform past predictions. Real fraud detection must use strict temporal splits: train on early transactions, test on later ones.

Real-world impact: Card fraud cost the global economy over $32 billion in 2023. Payment networks like Visa, Mastercard, and Amex score every transaction in <100ms — the model must be fast and explainable for compliance.

IEEE-CIS Fraud Detection Dataset

Real-world transaction data provided by Vesta Corporation for the IEEE Computational Intelligence Society (CIS) Fraud Detection competition. One of the most feature-rich public fraud datasets available.

590,540Total transactions
3.5%Fraud rate
393Transaction features
41Identity features
Feature Groups
V1–V339 — Vesta engineered features (anonymised risk signals)
card1–card6 — Card type, bank, category (graph linkage nodes)
addr1, addr2 — Billing/shipping address codes (graph linkage)
P_emaildomain, R_emaildomain — Purchaser/recipient email (graph linkage)
DeviceType, DeviceInfo, id_* — Device fingerprint (identity features)
TransactionAmt, ProductCD, dist* — Transaction metadata
Train / Val / Test Split (Temporal — No Leakage)

Sorted by TransactionDT before splitting. Earlier transactions train the model; later ones test it — mirroring real deployment where the model never sees the future.

Why Graph Neural Networks for Fraud?

Fraudsters don't act alone — they reuse infrastructure. Two stolen cards sharing the same device fingerprint or email domain is a signal invisible to per-transaction models but obvious in a graph.

01
Heterogeneous Transaction Graph

We build a graph where transaction nodes connect to entity nodes (cards, addresses, email domains, devices) via typed edges. Two cards sharing a device become graph neighbors — the GNN propagates fraud signals across this structure.

Transaction Card | Address | | Device
02
Graph Attention Network (GAT)

GAT learns attention weights per neighbor — not all connections are equally suspicious. A card sharing an email with 2 other cards is more suspicious than sharing with 2,000. Attention captures this asymmetry automatically.

3-layer HeteroConv · 4 attention heads · 128 hidden dims · Focal loss (α=0.965, γ=2.0)
03
LightGBM Tabular Baseline

GBDT excels at tabular data with mixed feature types. We use it as a strong standalone model and as a stacking input to the ensemble — it captures feature-level patterns the GNN may miss from the 428-feature engineered set.

640 trees · num_leaves=63 · is_unbalance=True · Early stopping on val AUC
04
Stacking Ensemble

GNN scores (graph structure) + LightGBM scores (feature patterns) → Logistic Regression meta-learner. Each model brings orthogonal signal — the ensemble consistently outperforms either alone on imbalanced fraud data.

Meta-learner trained on val set predictions · Threshold optimised by F1

Key Engineering Decisions

Imbalance Handling
Fraud nodes oversampled 27× in NeighborLoader seed indices. Focal Loss with α=0.965 (tuned to 27:1 class ratio) focuses gradient on hard-to-classify fraud cases. LightGBM uses is_unbalance=True.
No Data Leakage
Temporal split by TransactionDT: 70% train → 15% val → 15% test, strictly ordered. Card-level aggregations computed on train set only and applied to val/test — no future information bleeds backward.
Graph Sparsity Control
Entity groups with >20 cards (e.g. "gmail.com" shared by thousands) are skipped — they produce O(n²) edges and are not indicative of fraud rings. Small tight groups are the signal.

Model Performance

Two metrics matter most for imbalanced fraud detection: ROC-AUC (overall discrimination) and PR-AUC (precision-recall trade-off at minority class — stricter and more informative at 3.5% fraud rate).

LightGBM AUC
GNN AUC
Ensemble AUC
Ensemble PR-AUC
ROC-AUC vs PR-AUC by Model
ROC Curves
Precision–Recall Curves (stricter metric for 3.5% fraud)
Ensemble Threshold Metrics

Threshold optimised by F1 score on test set (not default 0.5 — default is wrong for imbalanced data).

SHAP Explainability

SHAP (SHapley Additive exPlanations) reveals why the model predicts fraud for a specific transaction. Critical for regulatory compliance in financial services.

Global Feature Importance (mean |SHAP|)
SHAP Waterfall — Fraud Transaction Sample 🔴 Red = increases fraud probability

Uncovering Organised Fraud Networks

After training the GNN, we extract node embeddings and run Louvain community detection on the card co-occurrence graph. Communities with fraud rates significantly above the 3.5% baseline are flagged as fraud rings.

Fraud Ring Graph — Card Co-occurrence Communities

Each node = a card. Edges = shared email domain, address, or device type. Node size scales with fraud rate. Hover for details.

Detected Communities
Baseline fraud rate: 3.5%. Communities above this threshold represent statistically elevated risk clusters. The top ring at fraud rate is × the baseline.

Real-Time Fraud Prediction

Submit a transaction to the trained ensemble endpoint. The model returns fraud probability from both LightGBM and GNN components.

Hours 0–5 historically higher risk
🔍

Submit a transaction to see the fraud probability score.

Ensemble Fraud Probability
LightGBM
GNN (GAT)

Limitations & Future Improvements

Concept Drift Detection

Fraud patterns evolve — fraudsters adapt when a technique gets blocked. Adding ADWIN/DDM drift monitoring would flag when model performance degrades on recent transactions, triggering retraining.

Temporal Graph Attention

Current GAT treats the graph as static. A temporal variant would weight recent connections more heavily, capturing that two cards sharing a device yesterday is more suspicious than sharing one last year.

Richer Graph Structure

Currently using 4 entity types (card, address, email, device). Adding merchant IDs, IP addresses, and transaction amounts as edge weights would significantly enrich the graph signal for ring detection.