COVID-19 Mortality Risk Prediction Model Using Machine Learning

Alba Maribel Sánchez-Gálvez, Sully Sánchez-Gálvez, Ricardo Álvarez-González, Frida Rojas-Alarcon


The COVID-19 outbreak commenced in Wuhan, China, in December 2019 and swiftly disseminated worldwide. On March 11, 2020, the World Health Organization (WHO) formally designated the COVID-19 outbreak as a global pandemic [1]. This highly contagious disease, which has also started to spread among young people, necessitates the implementation of policies to avert the collapse of hospitals due to shortages in beds, mechanical ventilators, and intensive care units. Using the record from March 2020 to May 31 2021, (before the arrival of vaccines in Mexico) of the website of the General Directorate of Epidemiology of the Ministry of Health of the Government of the Mexican Republic, with more than seven million people associated with COVID-19, a model is built with twelve characteristics that predicts the risk of death associated with COVID-19, applying Supervised Learning Algorithms such as Linear Regression, Naive Bayes, Decision Trees and Random Forests. The results of the machine learning algorithms show a performance of 87%. Subsequently, the model was tested again with 402,116 COVID-19-associated patients from the month of June, achieving an accuracy of 91%, surpassing the model proposed in [3]. When performing data cleaning, we observed that the twelve variables are not correlated, unlike what the authors showed in [4], where data cleaning was not carried out. This analysis can aid in designing strategies and policies to combat its spread and prevent mortality, as well as assist hospitals in prioritizing the care of higher-risk patients.



Machine learning algorithms, ablation study, COVID-19

Full Text: PDF