Feature Optimization for Classification of Medical Records with Evolutionary Algorithm

Authors

  • Yuda Syahidin Health Information Management, Piksi Ganesha Politechnic, Bandung, Indonesia
  • Ade Irma Suryani Medical Records and Health Information, Piksi Ganesha Polytechnic, Bandung, Indonesia
  • Ika Rahman Physiotherapy, Ganesha Polytechnic, Bandung, Indonesia

DOI:

https://doi.org/10.38035/dhps.v2i1.636

Keywords:

Electronic Medical Record, Genetic Algorithm, Feature Optimization, Support Vector Machine

Abstract

Feature selection in medical records is necessary because the data usually contains many irrelevant features and noise. Electronic Health Record, abbreviated as EHR, makes it possible to analyze large amounts of medical data. A Genetic Algorithm is widely used for feature selection because it has the ability or potential for global optimization of the selected features. Genetic Algorithm-based methods include many iterations (generations) in the crossover process, and mutation can produce new individuals because the Genetic Algorithm adopts a fitness value to represent how ''good'' the resulting individual is. The problem with heuristic algorithms is those simple genetic algorithms are not for processing high-dimensional data. Genetic algorithms in solution search techniques always get local optimum solutions which can cause failure to obtain optimal solutions during random searches. From these limitations, developing and improving genetic algorithms for feature selection on clinical data is necessary. First, sort the features based on the feature evaluation criteria to exclude irrelevant features through the fitness process in the evaluation with the accuracy value of the Support Vector Machine calculation. This way reduces the number of features and results in optimal features. Then to get the optimal solution, it is necessary to optimize the subset features that have been selected using a machine learning algorithm that determines the best parameters using a genetic algorithm.

References

Ahn, C. W. (2006). Practical genetic algorithms. In Studies in Computational Intelligence (Vol. 18). https://doi.org/10.1007/11543138_2

Alharthi, H. (2018). Healthcare predictive analytics: An overview with a focus on Saudi Arabia. Journal of Infection and Public Health, 11(6), 749–756. https://doi.org/10.1016/j.jiph.2018.02.005

Ashlock, D. (n.d.). Evolutionary Computation for Modeling and Optimization. In Department of Mathematics and Statistics University of Guelph Guelph, Ontario NIG 2W1 CANADA (Vol. 59).

Chen, Y., Wang, Y., Cao, L., & Jin, Q. (2021). CCFS: A Confidence-Based Cost-Effective Feature Selection Scheme for Healthcare Data Classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(3), 902–911. https://doi.org/10.1109/TCBB.2019.2903804

Daniel, E. (2018). Optimum Wavelet-Based Homomorphic Medical Image Fusion Using Hybrid Genetic-Grey Wolf Optimization Algorithm. IEEE Sensors Journal, 18(16), 6804–6811. https://doi.org/10.1109/JSEN.2018.2822712

Dewan, A., & Sharma, M. (2015). Prediction of heart disease using a hybrid technique in data mining classification. 2015 International Conference on Computing for Sustainable Global Development, INDIACom 2015, 704–706.

Dhar, J. (2021). Multistage Ensemble Learning Model with Weighted Voting and Genetic Algorithm Optimization Strategy for Detecting Chronic Obstructive Pulmonary Disease. IEEE Access, 9, 48640–48657. https://doi.org/10.1109/ACCESS.2021.3067949

Duan, J., Mao, S., Jin, J., Zhou, Z., Chen, L., & Chen, C. L. P. (2021). A Novel GA-Based Optimized Approach for Regional Multimodal Medical Image Fusion with Superpixel Segmentation. IEEE Access, 9, 96353–96366. https://doi.org/10.1109/ACCESS.2021.3094972

Ghorbani, R., Ghousi, R., Makui, A., & Atashi, A. (2020). A New Hybrid Predictive Model to Predict the Early Mortality Risk in Intensive Care Units on a Highly Imbalanced Dataset. IEEE Access, 8, 141066–141079. https://doi.org/10.1109/ACCESS.2020.3013320

Gong, X., Liu, L., Fong, S., Xu, Q., Wen, T., & Liu, Z. (2019). Comparative research of swam intelligence clustering algorithms for analyzing medical data. IEEE Access, 7, 137560–137569. https://doi.org/10.1109/ACCESS.2018.2881020

Guan, B., Zhang, C., & Ning, J. (2017). Genetic algorithm with a crossover elitist preservation mechanism for protein–ligand docking. AMB Express, 7(1). https://doi.org/10.1186/s13568-017-0476-0

Guo, T., Han, L., He, L., & Yang, X. (2014). A GA-based feature selection and parameter optimization for linear support higher-order tensor machine. Neurocomputing, 144, 408–416. https://doi.org/10.1016/j.neucom.2014.05.018

Huang, Zhaoke, Yang, C., Zhou, X., & Huang, T. (2019). A Hybrid Feature Selection Method Based on Binary State Transition Algorithm and ReliefF. IEEE Journal of Biomedical and Health Informatics, 23(5), 1888–1898. https://doi.org/10.1109/JBHI.2018.2872811

Huang, Zhengxing, Dong, W., Duan, H., & Liu, J. (2018a). A Regularized Deep Learning Approach for Clinical Risk Prediction of Acute Coronary Syndrome Using Electronic Health Records. IEEE Transactions on Biomedical Engineering, 65(5), 956–968. https://doi.org/10.1109/TBME.2017.2731158

Huang, Zhengxing, Dong, W., Duan, H., & Liu, J. (2018b). A Regularized Deep Learning Approach for Clinical Risk Prediction of Acute Coronary Syndrome Using Electronic Health Records. IEEE Transactions on Biomedical Engineering, 65(5), 956–968. https://doi.org/10.1109/TBME.2017.2731158

Jensen, P. B., Jensen, L. J., & Brunak, S. (2012). Mining electronic health records: Towards better research applications and clinical care. Nature Reviews Genetics, 13(6), 395–405. https://doi.org/10.1038/nrg3208

Kalinin, A. A., Higgins, G. A., Reamaroon, N., Soroushmehr, S., Allyn-Feuer, A., Dinov, I. D., Najarian, K., & Athey, B. D. (2018). Deep learning in pharmacogenomics: From gene regulation to patient stratification. In Pharmacogenomics (Vol. 19, Nomor 7, hal. 629–650). https://doi.org/10.2217/pgs-2018-0008

Kamel, S. R., YaghoubZadeh, R., & Kheirabadi, M. (2019). Improving the performance of support-vector machine by selecting the best features by Gray Wolf algorithm to increase the accuracy of diagnosis of breast cancer. Journal of Big Data, 6(1). https://doi.org/10.1186/s40537-019-0247-7

Kendale, S., Kulkarni, P., Rosenberg, A. D., & Wang, J. (2018). Supervised Machine-learning Predictive Analytics for Prediction of Postinduction Hypotension. Anesthesiology, 129(4), 675–688. https://doi.org/10.1097/ALN.0000000000002374

Le, T. M., Vo, T. M., Pham, T. N., & Dao, S. V. T. (2021). A Novel Wrapper-Based Feature Selection for Early Diabetes Prediction Enhanced with a Metaheuristic. IEEE Access, 9, 7869–7884. https://doi.org/10.1109/ACCESS.2020.3047942

Liang, K., Dai, W., & Du, R. (2020). A Feature Selection Method Based on Improved Genetic Algorithm. 2020 Global Reliability and Prognostics and Health Management, PHM-Shanghai 2020. https://doi.org/10.1109/PHM-Shanghai49105.2020.9281001

Liu, X. Y., Liang, Y., Wang, S., Yang, Z. Y., & Ye, H. S. (2018). A Hybrid Genetic Algorithm with Wrapper-Embedded Approaches for Feature Selection. IEEE Access, 6, 22863–22874. https://doi.org/10.1109/ACCESS.2018.2818682

Liu, Y., Wang, G., Chen, H., Dong, H., Zhu, X., & Wang, S. (2011). An improved particle swarm optimization for feature selection. Journal of Bionic Engineering, 8(2), 191–200. https://doi.org/10.1016/S1672-6529(11)60020-6

Louridi, N., Douzi, S., & El Ouahidi, B. (2021). Machine learning-based identification of patients with a cardiovascular defect. Journal of Big Data, 8(1). https://doi.org/10.1186/s40537-021-00524-9

Luque, R. M., Elizondo, D., López-Rubio, E., & Palomo, E. J. (2011). GA-based feature selection approach in biometric hand systems. Proceedings of the International Joint Conference on Neural Networks, 246–253. https://doi.org/10.1109/IJCNN.2011.6033228

Majidnezhad, V. (2015). A novel hybrid of genetic algorithm and ANN for developing a high efficient method for vocal fold pathology diagnosis. Eurasip Journal on Audio, Speech, and Music Processing, 2015(1), 1–11. https://doi.org/10.1186/s13636-014-0046-1

Nguyen, M. H., Le Nguyen, P., Nguyen, K., Le, V. A., Nguyen, T. H., & Ji, Y. (2021). PM2.5 Prediction Using Genetic Algorithm-Based Feature Selection and Encoder-Decoder Model. IEEE Access, 9, 57338–57350. https://doi.org/10.1109/ACCESS.2021.3072280

Pawlovsky, A. P., & Matsuhashi, H. (2017). The use of a novel genetic algorithm in component selection for a kNN method for breast cancer prognosis. Pan American Health Care Exchanges, PAHCE, 2017-March. https://doi.org/10.1109/GMEPE-PAHCE.2017.7972084

Peng, C., Wu, X., Yuan, W., Zhang, X., Zhang, Y., & Li, Y. (2021). MGRFE: Multilayer Recursive Feature Elimination Based on an Embedded Genetic Algorithm for Cancer Classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(2), 621–632. https://doi.org/10.1109/TCBB.2019.2921961

Prasetiyowati, M. I., Maulidevi, N. U., & Surendro, K. (2021). Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest. Journal of Big Data, 8(1). https://doi.org/10.1186/s40537-021-00472-4

Rojas-Dominguez, A., Padierna, L. C., Carpio Valadez, J. M., Puga-Soberanes, H. J., & Fraire, H. J. (2017). Optimal Hyper-Parameter Tuning of SVM Classifiers with Application to Medical Diagnosis. IEEE Access, 6, 7164–7176. https://doi.org/10.1109/ACCESS.2017.2779794

Roque, F. S., Jensen, P. B., Schmock, H., Dalgaard, M., Andreatta, M., Hansen, T., Søeby, K., Bredkjær, S., Juul, A., Werge, T., Jensen, L. J., & Brunak, S. (2011). Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Computational Biology, 7(8). https://doi.org/10.1371/journal.pcbi.1002141

Sakri, S. B., Abdul Rashid, N. B., & Muhammad Zain, Z. (2018). Particle Swarm Optimization Feature Selection for Breast Cancer Recurrence Prediction. IEEE Access, 6, 29637–29647. https://doi.org/10.1109/ACCESS.2018.2843443

Soguero-Ruiz, C., Hindberg, K., Rojo-Alvarez, J. L., Skrovseth, S. O., Godtliebsen, F., Mortensen, K., Revhaug, A., Lindsetmo, R. O., Augestad, K. M., & Jenssen, R. (2016). Support Vector Feature Selection for Early Detection of Anastomosis Leakage from Bag-of-Words in Electronic Health Records. IEEE Journal of Biomedical and Health Informatics, 20(5), 1404–1415. https://doi.org/10.1109/JBHI.2014.2361688

Sun, Z., Bebis, G., & Miller, R. (2004). Object detection using feature subset selection. Pattern Recognition, 37(11), 2165–2176. https://doi.org/10.1016/j.patcog.2004.03.013

Swamynathan, M. (2017). Mastering Machine Learning with Python in Six Steps - review and good into in ML and NN approaches and basics + Python samples --Each topic has two parts: the first part will cover the theoretical concepts and the second part will cover practical impleme. In Scandinavian Journal of Information Systems (Vol. 19, Nomor 2). http://aisel.aisnet.org/sjis%0Ahttp://aisel.aisnet.org/sjis/vol19/iss2/4

Xiong, B., Li, Y., Huang, M., Shi, W., Du, M., & Yang, Y. (2019). Feature Selection of Input Variables for Intelligence Joint Moment Prediction Based on Binary Particle Swarm Optimization. IEEE Access, 7, 182289–182295. https://doi.org/10.1109/ACCESS.2019.2959064

Yala, A., Barzilay, R., Salama, L., Griffin, M., Sollender, G., Bardia, A., Lehman, C., Buckley, J. M., Coopey, S. B., Polubriaginof, F., Garber, J. E., Smith, B. L., Gadd, M. A., Specht, M. C., Gudewicz, T. M., Guidi, A. J., Taghian, A., & Hughes, K. S. (2017). Using machine learning to parse breast pathology reports. Breast Cancer Research and Treatment, 161(2), 203–211. https://doi.org/10.1007/s10549-016-4035-1

Zeng, X., Lin, S., & Liu, C. (2021). Multi-view Deep Learning Framework for Predicting Patient Expenditure in Healthcare. https://doi.org/10.1109/OJCS.2021.3052518

Zhenya, Q., & Zhang, Z. (2021). A hybrid cost-sensitive ensemble for heart disease prediction. BMC Medical Informatics and Decision Making, 21(1), 1–18. https://doi.org/10.1186/s12911-021-01436-7

Zhong, Z., Yuan, X., Liu, S., Yang, Y., & Liu, F. (2021). Machine learning prediction models for prognosis of critically ill patients after open-heart surgery. Scientific Reports, 11(1), 1–10. https://doi.org/10.1038/s41598-021-83020-7

Published

2024-08-02

How to Cite

Syahidin, Y., Irma Suryani, A., & Rahman, I. (2024). Feature Optimization for Classification of Medical Records with Evolutionary Algorithm . Dinasti Health and Pharmacy Science, 2(1), 1–9. https://doi.org/10.38035/dhps.v2i1.636