Abstract
|
One of the issues caused by the improper loan application validation is loan default whereby the debater will not be able to pay his/her financial obligations incurred. To avoid such issues, the bank needs to analyze huge data to come up with a proper decision. Therefore, Machine learning is a promising direction to give accurate and on-time decision to predict the loan defaulter. The aim of this paper is to minimize the credit risk by predicting the loan default based on different loan factors or features. First, the collected data will be cleaned using different preprocessing techniques. Next, the most influential features will be identified using the correlation between the features. Once the data preprocessing done, four machine learning algorithms will be trained and tested, which are K-Nearest Neighbor (KNN), Logistic Regression (LR), Decision Tree (DT), and Random Forest (RF). The novelty of this research can be represented in analyzing the behavior of four machine learning algorithms with different resampling techniques to find the most accurate prediction model. The experimental results showed the superiority of combination of Logistic Regression and over/under sampling on the other three algorithms in terms of accuracy precision, Recall, F1, and Area under curve (AUC). Such findings demonstrated the ability of Logistic Regression with resampling to predict the loan default based on the historical data provided..
|
Keywords
|
Loan Approval, Machine Learning Algorithm, Logistic Regression, Loan Defaulter, Prediction Model, Random Forest.
|