Towards improving real-time credit card fraud detection using supervised machine learning models on big data

Pitsane, M.Y.; Greeff, J.J.; Mogale, T.H.; Janse van Rensburg, J.T.; 29892287- Greeff,  Jacob Jacobus.; 20398999- Janse van Rensburg, Juanita Tertia.

Towards improving real-time credit card fraud detection using supervised machine learning models on big data

2023

Master of Science in Computer Science, North-West University, Vaal Triangle Campus

Show more [+]

The primary objective of this study is to improve existing supervised machine learning models to effectively detect credit card fraud on multiple datasets in real-time. Credit card fraud is a serious crime and it is a common type of identity theft. Financial institutions and consumers are experiencing economic and financial losses due to credit card fraud. The majority of businesses have shifted some sections of their services to online services that include e-commerce data and communication infrastructure to provide improved efficiency and accessibility to their consumers. This shift has made credit cards a popular payment method for both online and regular purchases and the increase in the use of these credit cards has resulted in an increase in credit card fraud activities. Fraudsters create credit cards that look identical to the legitimate cards in a way that it is difficult for credit card fraud investigators to tell the difference. The credit card fraud issue is exacerbated by fraudsters' constant improved fraudulent tactics and modus operandi that aid them in being at the top of cyber-crime prevention and detection systems for credit card fraud. The reality is that measures have been put in place to solve credit card fraud, but the increase in financial losses indicates that there is a continuous need for improved credit card fraud detection to effectively detect credit card fraud in real-time. Machine learning models can aid and alleviate credit card fraud by providing real-time detection of credit card fraud before it takes place. The problem that arises with machine learning models is poor performance in terms of accuracy if the data objects in the dataset have high dimensionality. The study aims to improve existing supervised machine learning models to detect credit card fraud on multiple big data datasets in real-time. Design science research (DSR) is the methodology followed to structure the research study and artefact design. Six supervised machine learning models, which include the K-Nearest Neighbor, Decision Tree, Adaboost Gaussian Naïve Bayes, Logistic Regression and Support Vector Machine, are improved and compared in detecting credit card fraud. Dimension reduction techniques such as the Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE) and Truncated Singular Value Decomposition (TSVD) were applied to these models to improve the model’s performance. The experimental findings revealed that the logistic regression supervised machine learning model achieved the highest accuracy of 94.05% amongst other models specific to the dataset that was used. The problem that arises with machine learning models is poor performance in terms of accuracy resulting from the dataset that has data objects with high dimensionality and a lot of outliers. The use of the dimension reduction techniques on the six supervised machine learning models showed significant improvements in results on the models’ accuracy compared to before the dimension reduction techniques were applied. This revealed that a dimensional reduction technique is a valuable tool for improving credit card fraud detection machine learning models accuracy. The comparison of the six supervised machine learning with state-of-the-art credit card fraud detection showed that the obtained results were competitive.

Show more [+]

Masters

Show more [+]