Abstract:
Heart disease is the one of the leading causes of death globally. Despite the fact that the
causes of heart disease varied from nation to nation. However, the risk variables would be practically the same. Heart disease refers to any condition that affects the cardiovascular system. Heart disease manifests itself in a variety of ways, each of which affects the heart and blood arteries differently. Predicting the prognosis of cardiovascular diseases on early stages can assist high-risk individuals to adopt lifestyle changes and, as a result, prevent repercussions. The goal of this study is to identify the most important risk factors that influence heart disease and to detect the possibility of having heart disease in advance. The information required for this study was gathered from ongoing cardiovascular studies on the inhabitants of Framingham, Massachusetts. The prediction model development is to determine if the patient has a 10-year risk of developing coronary heart disease (CHD). The dataset has about 4,000 records with 15 parameters. Initially, the data was fed into supervised machine learning approaches like Decision Tree (DT), Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayes (NB),Linear Discriminant Analysis (LDA),Logistic Regression (LR) and k- nearest neighbors (k-NN). In addition, bagging
and boosting techniques like Random Forest (RF), CatBoost, , LightGBM, and Extreme Gradient Boosting (XGBoost) also incorporated. Furthermore, the final ensemble model was built by adapting the algorithms with good performance namely CatBoost, Random Forest, and Logistic Regression algorithms to predict the risk of heart disease. Final ensemble model resulted in an accuracy of 86.20%.