Distributed Machine Learning Algorithms on Large Datasets

Machine learning algorithms are widely used in engineering and business.


poster
Poster

Machine learning algorithms are widely used in engineering and business.These  algorithms  scale  and  speed  varies  over  different  applications.   Most of them relies on vast amount of data in limited amount of time and space.  To handle these vast data we need distributed versions of the algorithms we have. In this work parallelization of machine learning algorithms were main problem focused on. The problem tried to parallelize is matrix factorization. We used stochastic gradient descent approach to minimize a linear loss function such as RMSE. We also needed to stratify the input matrix to not process with different processes overlapping regions in the projections of the factors in the matrix we want to approximate.