Distributed Machine Learning Algorithms on Large Datasets
Machine learning algorithms are widely used in engineering and business.

Machine learning algorithms are widely used in engineering and business.These algorithms scale and speed varies over different applications. Most of them relies on vast amount of data in limited amount of time and space. To handle these vast data we need distributed versions of the algorithms we have. In this work parallelization of machine learning algorithms were main problem focused on. The problem tried to parallelize is matrix factorization. We used stochastic gradient descent approach to minimize a linear loss function such as RMSE. We also needed to stratify the input matrix to not process with different processes overlapping regions in the projections of the factors in the matrix we want to approximate.