Implementing machine learning algorithms on spark

Shweta Mittal, Om Prakash Sangwan

Abstract


Massive amount of data is being generated from the number of sources on day to day basis. Spark is a very popular open source platform available freely on web to store and process big databases. For training the machines to learn hidden patterns/information from these huge raw databases, machine learning algorithm needs to be implemented. ML and MLLib are two machine learning libraries to implement machine learning algorithms in Spark. In this paper, Decision Trees, Random Forests and Gradient Boosted Trees have been implemented by using Cardiac and Telecom dataset on local PC as well as Google Colab and it was concluded that Gradient Boosted Trees performed better than Decision Trees and Random Forests in terms of accuracy but took longer time to execute. Further, it has been also observed that algorithms took less time to run on Colab GPU as compared to local PC.

Full Text: PDF

Published: 2021-06-29

How to Cite this Article:

Shweta Mittal, Om Prakash Sangwan, Implementing machine learning algorithms on spark, J. Math. Comput. Sci., 11 (2021), 5267-5277

Copyright © 2021 Shweta Mittal, Om Prakash Sangwan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

 

Copyright ©2025 JMCS