Explaining Factors of Student Attrition at Higher Education

Iara Alcauter, Lourdes Martinez-Villaseñor, Hiram Ponce


The examination of student attrition within higher education is a dynamic field that seeks to tackle the complex task of preventing dropout occurrences and formulating effective retention strategies. This challenge becomes particularly pertinent within the realm of Science, Technology, Engineering, and Mathematics (STEM) disciplines. In the pursuit of these objectives, this research endeavors to assess prevailing data mining methodologies, specifically focusing on Decision Trees (DT), Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Networks (ANN) – all of which are widely employed for the prediction of student attrition. The study is conducted on a comprehensive dataset encompassing engineering students from a prominent Mexican university, with a specific emphasis on the application of variable selection through Recursive Feature Elimination (RFE) and addressing class imbalance via Synthetic Minority Over-sampling Technique (SMOTE). The outcomes of this investigation conspicuously identify Random Forest as the most optimal predictive model, yielding an impressive accuracy rate of 98%. Additionally, the research underscores the effectiveness of RFE in discerning influential variables. Furthermore, to provide complex insights and decision support, the study harnesses the Local Interpretable Model-Agnostic Explanations (LIME) technique to expound upon the factors that wield significant impact. This multifaceted analysis contributes to the advancement of strategies for enhancing student retention within STEM disciplines.


Student attrition, machine learning, XAI, explainable artificial intelligence, higher education

Full Text: PDF