A study of machine learning algorithms to measure the feature importance in class-imbalance data of food insecurity cases in Indonesia

doi:10.28919/cmbn/7636

A study of machine learning algorithms to measure the feature importance in class-imbalance data of food insecurity cases in Indonesia

H. Dharmawan, B. Sartono, A. Kurnia, A. F. Hadi, E. Ramadhani

Abstract

The development of various machine learning algorithms on supervised models has become one of the issues in selecting a suitable algorithm. The black box of machine learning requires a technique that can be used to interpret the feature importance using the SHAP in order to obtain predictors. The class-imbalance problem in real cases is another challenge in improving the performance of minority class predictions. This study uses a food insecurity dataset, one of the SDG's important indicators to study to achieve zero hunger. The machine learning algorithms studied consisted of Random Forest, XGBoost, SVM, and NN. Meanwhile, the study of the effect of class-imbalance used three treatments: without handling, SMOTE-N, and ADASYN-N. Twelve models are built based on a combination of four algorithms and three treatments to study the performance models and their feature importance. The SMOTE-N and ADASYN-N were able to increase the sensitivity value up to 0.48 units higher when compared to without handling data. The agreement level on without handling data has a low value, indicated by the 0.736 ICC value, while on SMOTE-N and ADASYN-N, it is higher, indicated by the 0.925 and 0.919 ICC values, respectively. This study dataset is more suitable for using SMOTE-N. It is based on the higher ICC and superior AUC performance. The relatively high ICC value indicates that the use of machine learning algorithms does not influence the agreement level on the feature importance score. Therefore, the choice of a machine learning algorithm can refer to a measure of its performance. Random Forest produced the best performance (AUC and sensitivity). Therefore, the Random Forest SMOTE-N is the best model in this study. It produces food insecurity household characteristics with household conditions having poor water, a small house size, low household head education, few/no savers, and cement or tile flooring.

Full Text: PDF

Published: 2022-10-10

How to Cite this Article:

H. Dharmawan, B. Sartono, A. Kurnia, A. F. Hadi, E. Ramadhani, A study of machine learning algorithms to measure the feature importance in class-imbalance data of food insecurity cases in Indonesia, Commun. Math. Biol. Neurosci., 2022 (2022), Article ID 101

Copyright © 2022 H. Dharmawan, B. Sartono, A. Kurnia, A. F. Hadi, E. Ramadhani. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Commun. Math. Biol. Neurosci.

ISSN 2052-2541

Editorial Office: [email protected]

Username
Password

Communications in Mathematical Biology and Neuroscience

A study of machine learning algorithms to measure the feature importance in class-imbalance data of food insecurity cases in Indonesia

Abstract