FraudSense — Graph Neural Network Fraud Detection

01 — The Problem

Why Credit Card Fraud Is a Hard ML Problem

⚖️

Severe Class Imbalance

Only 3.5% of transactions are fraudulent. A naive classifier predicting "legit" for everything achieves 96.5% accuracy — making standard accuracy a useless metric. We need PR-AUC and F1 at an optimised threshold.

🕸️

Fraud Rings Are Relational

Fraudsters reuse devices, email domains, and billing addresses across multiple stolen cards. Traditional tabular models treat each transaction independently — they miss the network structure that exposes organised rings.

⏱️

Temporal Data Leakage

Random train/test splits cause data leakage — future transaction patterns inform past predictions. Real fraud detection must use strict temporal splits: train on early transactions, test on later ones.

Real-world impact: Card fraud cost the global economy over $32 billion in 2023. Payment networks like Visa, Mastercard, and Amex score every transaction in <100ms — the model must be fast and explainable for compliance.

02 — Dataset

IEEE-CIS Fraud Detection Dataset

Real-world transaction data provided by Vesta Corporation for the IEEE Computational Intelligence Society (CIS) Fraud Detection competition. One of the most feature-rich public fraud datasets available.

590,540Total transactions

3.5%Fraud rate

393Transaction features

41Identity features

Feature Groups

V1–V339 — Vesta engineered features (anonymised risk signals)

card1–card6 — Card type, bank, category (graph linkage nodes)

addr1, addr2 — Billing/shipping address codes (graph linkage)

P_emaildomain, R_emaildomain — Purchaser/recipient email (graph linkage)

DeviceType, DeviceInfo, id_* — Device fingerprint (identity features)

TransactionAmt, ProductCD, dist* — Transaction metadata

Train / Val / Test Split (Temporal — No Leakage)

Sorted by TransactionDT before splitting. Earlier transactions train the model; later ones test it — mirroring real deployment where the model never sees the future.

03 — Strategy & Architecture

Why Graph Neural Networks for Fraud?

Fraudsters don't act alone — they reuse infrastructure. Two stolen cards sharing the same device fingerprint or email domain is a signal invisible to per-transaction models but obvious in a graph.

Heterogeneous Transaction Graph

We build a graph where transaction nodes connect to entity nodes (cards, addresses, email domains, devices) via typed edges. Two cards sharing a device become graph neighbors — the GNN propagates fraud signals across this structure.

Transaction → Card | Address | Email | Device

Graph Attention Network (GAT)

GAT learns attention weights per neighbor — not all connections are equally suspicious. A card sharing an email with 2 other cards is more suspicious than sharing with 2,000. Attention captures this asymmetry automatically.

3-layer HeteroConv · 4 attention heads · 128 hidden dims · Focal loss (α=0.965, γ=2.0)

LightGBM Tabular Baseline

GBDT excels at tabular data with mixed feature types. We use it as a strong standalone model and as a stacking input to the ensemble — it captures feature-level patterns the GNN may miss from the 428-feature engineered set.

640 trees · num_leaves=63 · is_unbalance=True · Early stopping on val AUC

Stacking Ensemble

GNN scores (graph structure) + LightGBM scores (feature patterns) → Logistic Regression meta-learner. Each model brings orthogonal signal — the ensemble consistently outperforms either alone on imbalanced fraud data.

Meta-learner trained on val set predictions · Threshold optimised by F1

Key Engineering Decisions

Imbalance Handling

Fraud nodes oversampled 27× in NeighborLoader seed indices. Focal Loss with α=0.965 (tuned to 27:1 class ratio) focuses gradient on hard-to-classify fraud cases. LightGBM uses is_unbalance=True.

No Data Leakage

Temporal split by TransactionDT: 70% train → 15% val → 15% test, strictly ordered. Card-level aggregations computed on train set only and applied to val/test — no future information bleeds backward.

Graph Sparsity Control

Entity groups with >20 cards (e.g. "gmail.com" shared by thousands) are skipped — they produce O(n²) edges and are not indicative of fraud rings. Small tight groups are the signal.

04 — Results

Model Performance

Two metrics matter most for imbalanced fraud detection: ROC-AUC (overall discrimination) and PR-AUC (precision-recall trade-off at minority class — stricter and more informative at 3.5% fraud rate).

LightGBM AUC—

GNN AUC—

Ensemble AUC—

Ensemble PR-AUC—

ROC-AUC vs PR-AUC by Model

ROC Curves

Precision–Recall Curves (stricter metric for 3.5% fraud)

Ensemble Threshold Metrics

Threshold optimised by F1 score on test set (not default 0.5 — default is wrong for imbalanced data).

SHAP Explainability

SHAP (SHapley Additive exPlanations) reveals why the model predicts fraud for a specific transaction. Critical for regulatory compliance in financial services.

Global Feature Importance (mean |SHAP|)

SHAP Waterfall — Fraud Transaction Sample 🔴 Red = increases fraud probability

05 — Fraud Ring Detection

Uncovering Organised Fraud Networks

After training the GNN, we extract node embeddings and run Louvain community detection on the card co-occurrence graph. Communities with fraud rates significantly above the 3.5% baseline are flagged as fraud rings.

Fraud Ring Graph — Card Co-occurrence Communities

Each node = a card. Edges = shared email domain, address, or device type. Node size scales with fraud rate. Hover for details.

Detected Communities

Baseline fraud rate: 3.5%. Communities above this threshold represent statistically elevated risk clusters. The top ring at — fraud rate is —× the baseline.

07 — What's Next

Limitations & Future Improvements

Concept Drift Detection

Fraud patterns evolve — fraudsters adapt when a technique gets blocked. Adding ADWIN/DDM drift monitoring would flag when model performance degrades on recent transactions, triggering retraining.

Temporal Graph Attention

Current GAT treats the graph as static. A temporal variant would weight recent connections more heavily, capturing that two cards sharing a device yesterday is more suspicious than sharing one last year.

Richer Graph Structure

Currently using 4 entity types (card, address, email, device). Adding merchant IDs, IP addresses, and transaction amounts as edge weights would significantly enrich the graph signal for ring detection.

Graph Neural Network
Credit Card Fraud Detection

Why Credit Card Fraud Is a Hard ML Problem

Severe Class Imbalance

Fraud Rings Are Relational

Temporal Data Leakage

IEEE-CIS Fraud Detection Dataset

Why Graph Neural Networks for Fraud?

Heterogeneous Transaction Graph

Graph Attention Network (GAT)

LightGBM Tabular Baseline

Stacking Ensemble

Key Engineering Decisions

Model Performance

SHAP Explainability

Uncovering Organised Fraud Networks

Real-Time Fraud Prediction

Limitations & Future Improvements

Concept Drift Detection

Temporal Graph Attention

Richer Graph Structure

Graph Neural NetworkCredit Card Fraud Detection

Why Credit Card Fraud Is a Hard ML Problem

Severe Class Imbalance

Fraud Rings Are Relational

Temporal Data Leakage

IEEE-CIS Fraud Detection Dataset

Why Graph Neural Networks for Fraud?

Heterogeneous Transaction Graph

Graph Attention Network (GAT)

LightGBM Tabular Baseline

Stacking Ensemble

Key Engineering Decisions

Model Performance

SHAP Explainability

Uncovering Organised Fraud Networks

Real-Time Fraud Prediction

Limitations & Future Improvements

Concept Drift Detection

Temporal Graph Attention

Richer Graph Structure

Graph Neural Network
Credit Card Fraud Detection