01Exploratory data analysis: distributional differences between fraud and legitimate transactions in PCA-feature space, class imbalance visualisation
02Preprocessing: StandardScaler on Amount and Time, 80/20 stratified train-test split to preserve fraud class ratio
03SMOTE oversampling applied to training set only, preventing data leakage into the test fold
04Three-model benchmark: Logistic Regression (baseline), Random Forest, XGBoost under identical conditions
05XGBoost selected for production; feature importance reveals V1-V4 and Amount as strongest fraud predictors
06PR-AUC used as the primary evaluation metric because ROC-AUC is misleadingly high for severely imbalanced datasets