Overcoming missing values using imputation methods in the classification of tuberculosis

Eka Mala Sari Rochman, Miswanto -, Herry Suprajitno

Abstract


Indonesia is one of the countries with the highest population density in the world with a very high number of Tuberculosis (TB). This TB disease is very serious because it is very easily transmitted through the air, namely, droplets that come from a TB patient who coughs or sneezes. In diagnosing a disease, missing data often occurs, resulting in researcher errors in the data collection process, so this study proposes the mean Imputation method to overcome missing data. For the classification of TB disease data in Bangkalan Regency, Indonesia, which consists of 886 data, the method used is Naive Bayes compared to Logistics Regression. For the distribution of training and testing data, this research uses multiple trains and tests K-Fold cross-validation with a total of k=10. Based on research trials using the mean imputation method is better than the one imputation method in filling in the missing data for this case with an average accuracy is 97.36% and the F1 score is 95.01% better than one imputation with an average accuracy is 97.35% and F1 score is 94.35 % on the Naive Bayes method. For TB classification, the Naive Bayes method produces an average accuracy is 97.36% and the F1 score is 95.01% better than the logistic regression method in classifying tuberculosis with an accuracy rate is 97.36% with an F1 score is 89.58%.

Full Text: PDF

Published: 2022-07-25

How to Cite this Article:

Eka Mala Sari Rochman, Miswanto -, Herry Suprajitno, Overcoming missing values using imputation methods in the classification of tuberculosis, Commun. Math. Biol. Neurosci., 2022 (2022), Article ID 66

Copyright © 2022 Eka Mala Sari Rochman, Miswanto -, Herry Suprajitno. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Commun. Math. Biol. Neurosci.

ISSN 2052-2541

Editorial Office: office@scik.org

 

Copyright ©2022 CMBN