Feature Optimization for Classification of Medical Records with Evolutionary Algorithm
DOI:
https://doi.org/10.38035/dhps.v2i1.636Keywords:
Electronic Medical Record, Genetic Algorithm, Feature Optimization, Support Vector MachineAbstract
Feature selection in medical records is necessary because the data usually contains many irrelevant features and noise. Electronic Health Record, abbreviated as EHR, makes it possible to analyze large amounts of medical data. A Genetic Algorithm is widely used for feature selection because it has the ability or potential for global optimization of the selected features. Genetic Algorithm-based methods include many iterations (generations) in the crossover process, and mutation can produce new individuals because the Genetic Algorithm adopts a fitness value to represent how ''good'' the resulting individual is. The problem with heuristic algorithms is those simple genetic algorithms are not for processing high-dimensional data. Genetic algorithms in solution search techniques always get local optimum solutions which can cause failure to obtain optimal solutions during random searches. From these limitations, developing and improving genetic algorithms for feature selection on clinical data is necessary. First, sort the features based on the feature evaluation criteria to exclude irrelevant features through the fitness process in the evaluation with the accuracy value of the Support Vector Machine calculation. This way reduces the number of features and results in optimal features. Then to get the optimal solution, it is necessary to optimize the subset features that have been selected using a machine learning algorithm that determines the best parameters using a genetic algorithm.
References
Ahn, C. W. (2006). Practical genetic algorithms. In Studies in Computational Intelligence (Vol. 18). https://doi.org/10.1007/11543138_2
Alharthi, H. (2018). Healthcare predictive analytics: An overview with a focus on Saudi Arabia. Journal of Infection and Public Health, 11(6), 749–756. https://doi.org/10.1016/j.jiph.2018.02.005
Ashlock, D. (n.d.). Evolutionary Computation for Modeling and Optimization. In Department of Mathematics and Statistics University of Guelph Guelph, Ontario NIG 2W1 CANADA (Vol. 59).
Chen, Y., Wang, Y., Cao, L., & Jin, Q. (2021). CCFS: A Confidence-Based Cost-Effective Feature Selection Scheme for Healthcare Data Classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(3), 902–911. https://doi.org/10.1109/TCBB.2019.2903804
Daniel, E. (2018). Optimum Wavelet-Based Homomorphic Medical Image Fusion Using Hybrid Genetic-Grey Wolf Optimization Algorithm. IEEE Sensors Journal, 18(16), 6804–6811. https://doi.org/10.1109/JSEN.2018.2822712
Dewan, A., & Sharma, M. (2015). Prediction of heart disease using a hybrid technique in data mining classification. 2015 International Conference on Computing for Sustainable Global Development, INDIACom 2015, 704–706.
Dhar, J. (2021). Multistage Ensemble Learning Model with Weighted Voting and Genetic Algorithm Optimization Strategy for Detecting Chronic Obstructive Pulmonary Disease. IEEE Access, 9, 48640–48657. https://doi.org/10.1109/ACCESS.2021.3067949
Duan, J., Mao, S., Jin, J., Zhou, Z., Chen, L., & Chen, C. L. P. (2021). A Novel GA-Based Optimized Approach for Regional Multimodal Medical Image Fusion with Superpixel Segmentation. IEEE Access, 9, 96353–96366. https://doi.org/10.1109/ACCESS.2021.3094972
Ghorbani, R., Ghousi, R., Makui, A., & Atashi, A. (2020). A New Hybrid Predictive Model to Predict the Early Mortality Risk in Intensive Care Units on a Highly Imbalanced Dataset. IEEE Access, 8, 141066–141079. https://doi.org/10.1109/ACCESS.2020.3013320
Gong, X., Liu, L., Fong, S., Xu, Q., Wen, T., & Liu, Z. (2019). Comparative research of swam intelligence clustering algorithms for analyzing medical data. IEEE Access, 7, 137560–137569. https://doi.org/10.1109/ACCESS.2018.2881020
Guan, B., Zhang, C., & Ning, J. (2017). Genetic algorithm with a crossover elitist preservation mechanism for protein–ligand docking. AMB Express, 7(1). https://doi.org/10.1186/s13568-017-0476-0
Guo, T., Han, L., He, L., & Yang, X. (2014). A GA-based feature selection and parameter optimization for linear support higher-order tensor machine. Neurocomputing, 144, 408–416. https://doi.org/10.1016/j.neucom.2014.05.018
Huang, Zhaoke, Yang, C., Zhou, X., & Huang, T. (2019). A Hybrid Feature Selection Method Based on Binary State Transition Algorithm and ReliefF. IEEE Journal of Biomedical and Health Informatics, 23(5), 1888–1898. https://doi.org/10.1109/JBHI.2018.2872811
Huang, Zhengxing, Dong, W., Duan, H., & Liu, J. (2018a). A Regularized Deep Learning Approach for Clinical Risk Prediction of Acute Coronary Syndrome Using Electronic Health Records. IEEE Transactions on Biomedical Engineering, 65(5), 956–968. https://doi.org/10.1109/TBME.2017.2731158
Huang, Zhengxing, Dong, W., Duan, H., & Liu, J. (2018b). A Regularized Deep Learning Approach for Clinical Risk Prediction of Acute Coronary Syndrome Using Electronic Health Records. IEEE Transactions on Biomedical Engineering, 65(5), 956–968. https://doi.org/10.1109/TBME.2017.2731158
Jensen, P. B., Jensen, L. J., & Brunak, S. (2012). Mining electronic health records: Towards better research applications and clinical care. Nature Reviews Genetics, 13(6), 395–405. https://doi.org/10.1038/nrg3208
Kalinin, A. A., Higgins, G. A., Reamaroon, N., Soroushmehr, S., Allyn-Feuer, A., Dinov, I. D., Najarian, K., & Athey, B. D. (2018). Deep learning in pharmacogenomics: From gene regulation to patient stratification. In Pharmacogenomics (Vol. 19, Nomor 7, hal. 629–650). https://doi.org/10.2217/pgs-2018-0008
Kamel, S. R., YaghoubZadeh, R., & Kheirabadi, M. (2019). Improving the performance of support-vector machine by selecting the best features by Gray Wolf algorithm to increase the accuracy of diagnosis of breast cancer. Journal of Big Data, 6(1). https://doi.org/10.1186/s40537-019-0247-7
Kendale, S., Kulkarni, P., Rosenberg, A. D., & Wang, J. (2018). Supervised Machine-learning Predictive Analytics for Prediction of Postinduction Hypotension. Anesthesiology, 129(4), 675–688. https://doi.org/10.1097/ALN.0000000000002374
Le, T. M., Vo, T. M., Pham, T. N., & Dao, S. V. T. (2021). A Novel Wrapper-Based Feature Selection for Early Diabetes Prediction Enhanced with a Metaheuristic. IEEE Access, 9, 7869–7884. https://doi.org/10.1109/ACCESS.2020.3047942
Liang, K., Dai, W., & Du, R. (2020). A Feature Selection Method Based on Improved Genetic Algorithm. 2020 Global Reliability and Prognostics and Health Management, PHM-Shanghai 2020. https://doi.org/10.1109/PHM-Shanghai49105.2020.9281001
Liu, X. Y., Liang, Y., Wang, S., Yang, Z. Y., & Ye, H. S. (2018). A Hybrid Genetic Algorithm with Wrapper-Embedded Approaches for Feature Selection. IEEE Access, 6, 22863–22874. https://doi.org/10.1109/ACCESS.2018.2818682
Liu, Y., Wang, G., Chen, H., Dong, H., Zhu, X., & Wang, S. (2011). An improved particle swarm optimization for feature selection. Journal of Bionic Engineering, 8(2), 191–200. https://doi.org/10.1016/S1672-6529(11)60020-6
Louridi, N., Douzi, S., & El Ouahidi, B. (2021). Machine learning-based identification of patients with a cardiovascular defect. Journal of Big Data, 8(1). https://doi.org/10.1186/s40537-021-00524-9
Luque, R. M., Elizondo, D., López-Rubio, E., & Palomo, E. J. (2011). GA-based feature selection approach in biometric hand systems. Proceedings of the International Joint Conference on Neural Networks, 246–253. https://doi.org/10.1109/IJCNN.2011.6033228
Majidnezhad, V. (2015). A novel hybrid of genetic algorithm and ANN for developing a high efficient method for vocal fold pathology diagnosis. Eurasip Journal on Audio, Speech, and Music Processing, 2015(1), 1–11. https://doi.org/10.1186/s13636-014-0046-1
Nguyen, M. H., Le Nguyen, P., Nguyen, K., Le, V. A., Nguyen, T. H., & Ji, Y. (2021). PM2.5 Prediction Using Genetic Algorithm-Based Feature Selection and Encoder-Decoder Model. IEEE Access, 9, 57338–57350. https://doi.org/10.1109/ACCESS.2021.3072280
Pawlovsky, A. P., & Matsuhashi, H. (2017). The use of a novel genetic algorithm in component selection for a kNN method for breast cancer prognosis. Pan American Health Care Exchanges, PAHCE, 2017-March. https://doi.org/10.1109/GMEPE-PAHCE.2017.7972084
Peng, C., Wu, X., Yuan, W., Zhang, X., Zhang, Y., & Li, Y. (2021). MGRFE: Multilayer Recursive Feature Elimination Based on an Embedded Genetic Algorithm for Cancer Classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(2), 621–632. https://doi.org/10.1109/TCBB.2019.2921961
Prasetiyowati, M. I., Maulidevi, N. U., & Surendro, K. (2021). Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest. Journal of Big Data, 8(1). https://doi.org/10.1186/s40537-021-00472-4
Rojas-Dominguez, A., Padierna, L. C., Carpio Valadez, J. M., Puga-Soberanes, H. J., & Fraire, H. J. (2017). Optimal Hyper-Parameter Tuning of SVM Classifiers with Application to Medical Diagnosis. IEEE Access, 6, 7164–7176. https://doi.org/10.1109/ACCESS.2017.2779794
Roque, F. S., Jensen, P. B., Schmock, H., Dalgaard, M., Andreatta, M., Hansen, T., Søeby, K., Bredkjær, S., Juul, A., Werge, T., Jensen, L. J., & Brunak, S. (2011). Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Computational Biology, 7(8). https://doi.org/10.1371/journal.pcbi.1002141
Sakri, S. B., Abdul Rashid, N. B., & Muhammad Zain, Z. (2018). Particle Swarm Optimization Feature Selection for Breast Cancer Recurrence Prediction. IEEE Access, 6, 29637–29647. https://doi.org/10.1109/ACCESS.2018.2843443
Soguero-Ruiz, C., Hindberg, K., Rojo-Alvarez, J. L., Skrovseth, S. O., Godtliebsen, F., Mortensen, K., Revhaug, A., Lindsetmo, R. O., Augestad, K. M., & Jenssen, R. (2016). Support Vector Feature Selection for Early Detection of Anastomosis Leakage from Bag-of-Words in Electronic Health Records. IEEE Journal of Biomedical and Health Informatics, 20(5), 1404–1415. https://doi.org/10.1109/JBHI.2014.2361688
Sun, Z., Bebis, G., & Miller, R. (2004). Object detection using feature subset selection. Pattern Recognition, 37(11), 2165–2176. https://doi.org/10.1016/j.patcog.2004.03.013
Swamynathan, M. (2017). Mastering Machine Learning with Python in Six Steps - review and good into in ML and NN approaches and basics + Python samples --Each topic has two parts: the first part will cover the theoretical concepts and the second part will cover practical impleme. In Scandinavian Journal of Information Systems (Vol. 19, Nomor 2). http://aisel.aisnet.org/sjis%0Ahttp://aisel.aisnet.org/sjis/vol19/iss2/4
Xiong, B., Li, Y., Huang, M., Shi, W., Du, M., & Yang, Y. (2019). Feature Selection of Input Variables for Intelligence Joint Moment Prediction Based on Binary Particle Swarm Optimization. IEEE Access, 7, 182289–182295. https://doi.org/10.1109/ACCESS.2019.2959064
Yala, A., Barzilay, R., Salama, L., Griffin, M., Sollender, G., Bardia, A., Lehman, C., Buckley, J. M., Coopey, S. B., Polubriaginof, F., Garber, J. E., Smith, B. L., Gadd, M. A., Specht, M. C., Gudewicz, T. M., Guidi, A. J., Taghian, A., & Hughes, K. S. (2017). Using machine learning to parse breast pathology reports. Breast Cancer Research and Treatment, 161(2), 203–211. https://doi.org/10.1007/s10549-016-4035-1
Zeng, X., Lin, S., & Liu, C. (2021). Multi-view Deep Learning Framework for Predicting Patient Expenditure in Healthcare. https://doi.org/10.1109/OJCS.2021.3052518
Zhenya, Q., & Zhang, Z. (2021). A hybrid cost-sensitive ensemble for heart disease prediction. BMC Medical Informatics and Decision Making, 21(1), 1–18. https://doi.org/10.1186/s12911-021-01436-7
Zhong, Z., Yuan, X., Liu, S., Yang, Y., & Liu, F. (2021). Machine learning prediction models for prognosis of critically ill patients after open-heart surgery. Scientific Reports, 11(1), 1–10. https://doi.org/10.1038/s41598-021-83020-7
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Yuda Syahidin, Ade Irma Suryani, Ika Rahman
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright :
Authors who publish their manuscripts in this journal agree to the following conditions:
- Copyright in each article belongs to the author.
- The author acknowledges that the DHPS has the right to be the first to publish under a Creative Commons Attribution 4.0 International license (Attribution 4.0 International CC BY 4.0).
- Authors can submit articles separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal to other versions (for example, sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time at DHPS.