Optimizing Credit Risk Prediction in the Financial Sector Using Boosting Algorithms: A Comparative Study with Financial Datasets

Authors

  • Renzo Orlando Villanueva Mora Universidad de Lima
  • Edwin Jonathan Escobedo Cardenas Universidad de Lima

DOI:

https://doi.org/10.13053/cys-29-2-5173

Keywords:

XGBoost, LightGBM, Boosted Random Forest, Boosting Algorithms, Credit Risk, Credit Score, Financial Sector

Abstract

Credit risk is a significant concern for financial institutions. Despite advances in predictive models, there is still room for improvement in accurately assessing credit risk. This study focuses on developing a methodological process to predict credit risk in the financial sector using algorithms based on boosting techniques, such as XGBoost, LightGBM and Boosted Random Forest. We found that datasets with good accessibility and an appropriate variable distribution are contained in the UCI Machine Learning Repository. These datasets are potential to outperform results with different metrics, such as the F-Score and the Area Under the Curve. The datasets used include Statlog German Credit Data, Statlog Australian Credit Approval, Bank Marketing, Credit Approval, and South German Credit Data.  The approach involves feature engineering, exploratory data analysis, and hyperparameter tuning. Furthermore, we propose a new strategy that involves adding a column based on an unsupervised algorithm such as Kmeans. Our results indicate that  XGBoost has better performance than LightGBM and Boosted Random Forest in different scenarios. Finally, the performance of these boosting-based models is superior to that of Boosted Decision Trees and Factorization Machine models from previous studies. These findings are important for financial institutions seeking an effective methodology to improve credit risk prediction rate.

Author Biography

Renzo Orlando Villanueva Mora, Universidad de Lima

Lima Peru

Downloads

Additional Files

Published

2025-06-18

Issue

Section

Articles