Proposed two variable selection methods for big data: simulation and application to air quality data in Italy

Ahmed A. El-Sheikh, Mohamed R. Abonazel, Mohamed C. Ali

Abstract


In this era of big data, considerable amounts of information data are produced daily with the rapid development of technology. In various fields, such as engineering, computer science, and finance, several statistical and machine learning methods are used to uncover useful information and patterns behind these enormous datasets. Neural networks (NN) and random forests (RF) are the common model selection (variable selection) methods in machine learning. The least absolute shrinkage and selection operator (LASSO) and principal component analysis are the statistical methods. In this study, we propose two methods: a combination of NN and LASSO and a combination of NN and RF. We use Monte Carlo simulation and a real data application (air quality data in Italy) to investigate the performance of the classical methods (ordinary least square and feed-forward NN) and two proposed methods by the goodness of fit criteria. The results showed that the proposed methods perform better than the classical methods.

Full Text: PDF

Published: 2022-02-16

How to Cite this Article:

Ahmed A. El-Sheikh, Mohamed R. Abonazel, Mohamed C. Ali, Proposed two variable selection methods for big data: simulation and application to air quality data in Italy, Commun. Math. Biol. Neurosci., 2022 (2022), Article ID 16

Copyright © 2022 Ahmed A. El-Sheikh, Mohamed R. Abonazel, Mohamed C. Ali. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Commun. Math. Biol. Neurosci.

ISSN 2052-2541

Editorial Office: office@scik.org

 

Copyright ©2022 CMBN