Coronary artery disease (CAD) is defined as the type of disease that occurred due to narrowing arteries transporting oxygenated blood to the heart, which resulted serious cardiovascular complications including angina and heart attack . CAD is one of the deadly diseases in the world nowadays, especially in the developing countries. It was estimated that, almost one third of middle aged women and one half of the middle aged men in United States were affected with the disease . The disease has become number one killer in the developed countries, with over 7.4 million deaths in 2012 .
Whilst in Nigeria, CAD is not a common disease but yet, many people were victimized, costing lives, due to lack of awareness of the disease among the common people . In May, 2014 the death caused by CAD in Nigeria reached 53,836 or 2.82% of total deaths . Nowadays, CAD is recorded among the deadly diseases in Nigeria. Therefore, prediction of the disease among people, can result in significant life-saving if followed with adequate medical attention and treatments .
Hence, data mining is used to uncover hidden patterns or novel and useful knowledge from dataset and it has been used for diagnosis and prediction of many diseases .Like other diseases, data mining algorithm are used for diagnosis of CAD. In recent years, many experts harnessed and used data mining techniques and algorithms for diagnosis of CAD. Therefore, this study carried out the performance evaluation of various data mining algorithm to find out the best algorithm for CAD dataset, so that it can be used for the development of the CAD prediction system.
CAD patients’ diagnostic dataset was collected from Murtala Muhammad General Hospital and Abdullahi Wase General Hospitals in Kano State, Nigeria respectively. The collection of the data was approved by Ministry of Health, Kano, Nigeria in the letter dated on 27th October, 2017. A total of five hundred and six diagnosis cases of coronary artery disease between 2003 and 2017 in both hospitals were captured.
The risk factors responsible of CAD were considered with one demographic risk attributes which is age and other clinical attributes which include High Density of Lipoprotein (HDL), cholesterol, chest pain, glucose, blood pressure, triglyceride, creatinine, Low Density of Lipoprotein (LDL), body mass index (BMI), heart rate, chest pain and diagnostic result of the doctor.
The CAD dataset was cleaned, preprocessed, and built into desired form. The data collected was transformed into Attribute-Relation File Format (ARFF). Data type of each attribute of the data was determined. Table 1, shows the data type of each attribute and Figure 1 to Figure 6, shows the data visualization of attributes of the CAD dataset.
Weka machine learning software is used for the performance evaluation of the algorithms. However, the improved C4.5 algorithm had to be encoded into the Weka, due to its unavailability. So all the algorithms we applied directly on the training CAD dataset. The performance evaluation result is shown in Table 1 and figure 1, respectively.
Improved C4.5 algorithm proved to be best algorithm on CAD dataset with a higher performance accuracy of 97.2397.23, specificity of 97.03%, and sensitivity of 96.39%. The result of the also shows that, C4.5 algorithm is the second best algorithm for CAD dataset but with regards to accuracy and sensitivity only because for specificity BayesNes if the second best algorithm on the dataset with 95.02%. Therefore, from result of the performance evaluation, it is evident that, the improved C4.5 algorithm has the capability to predict both healthy and unhealthy patients with respect to CAD it has more than ninety-six percentage accuracy for both sensitivity and specificity sensitivity, with overall accuracy of 97.23%.
L. Adel, N. A. Raja, Z. Roziati, B. Awang, “Design of a Fuzzy-based Decision Support System for Coronary Heart Disease Diagnosis”. J Med Syst, 2012.
L. J. Muhammad, E. J. Garba, N. D. Oye, G. M. Wajiga, “On the Problems of Knowledge Acquisition and Representation of Expert System for Diagnosis of Coronary Artery Disease (CAD)”, International Journal of u- and e- Service, Science and Technology, 2018, vol. 11:30, pp. 49-58.
World Health Organization (WHO), “Cardiovascular diseases, factsheet#317”, Retrieved from
http://www.who.int/mediacentre/factsheets/fs317/en/ last accessed 16th January, 2018.
L. Verma, S. Srivastava, “A Data Mining Model for Coronary Artery Disease Detection using Noninvasive Clinical Parameters”, Indian Journal of Science and Technology, 2016, Vol 9(48) pp.1 -6.
Dr. Ahmed Abba Haruna is a Senior Lecturer of Information Technology in Skyline University Nigeria. He has a PhD. InInformation Technology from Universiti Teknologi PETRONAS (UTP), Malaysia.
You can join the conversation on facebook @SkylineUniversityNG and on twitter @SkylineUNigeria