Vol 3 , Issue 2 , July - December 2023 | Pages: 133-152 | Research Paper
Published Online: February 12, 2024
Author Details
( * ) denotes Corresponding author
There are many emerging technologies such as machine learning and data analytics that offer promising solutions to healthcare challenges, biomedical communities, and patient care. Early detection of disease symptoms helps improve disease management strategies. Early detection also helps with disease symptom control and efficient therapy. In this study, we present a complete preprocessing strategy to predict coronary heart disease (CHD). The method includes computing null values, standardizing, categorizing, normalizing, resampling, and finally predicting. The purpose of the study is to predict CHD using machine learning techniques such as Random Forest, k-nearest neighbors, decision trees, logistic regression, and gradient boosting. We propose K-fold cross validation to provide predictability across the data. We test these algorithms on 4240 records from the “Framingham Heart Study” dataset. We also use a feature selection algorithm to reduce the dimensionality problem while keeping the computational complexity close to acceptable accuracy. The feature selection algorithm reduces the dimensionality problem and keeps the computational complexity close to acceptable accuracy. To predict accurately the risk of heart disease and to assess if a person has a risk of CHD, a new ensemble method using gradient boosting, random forest and k-nearest neighbor with majority vote has been tested with 96.16% accuracy and 0.96 ROC-AUC score. The experiments show that advances in machine learning, combined with predictive analytics, offer a potential environment for finding intelligent solutions, showing the potential of prediction in the field of cardiovascular disease and beyond.
Keywords
Coronary Heart Disease; Gradient Boosting; Random Forest; KNN; Early detection; Machine Learning