Parallelizing Linear Regression With MapReduce

Kevin Feasel

2018-06-25

R

Arthur Charpentier shows us the math behind using MapReduce to parallelize a linear regression:

Sometimes, with big data, matrices are too big to handle, and it is possible to use tricks to numerically still do the map. Map-Reduce is one of those. With several cores, it is possible to split the problem, to map on each machine, and then to aggregate it back at the end.

Arthur gives us an interesting example in R to boot.

Related Posts

Using ggpairs To Find Correlations Between Variables In R

Akshay Mahale shows how to use the ggpairs function in R to see the correlation between different pairs of variables: From the above matrix for iris we can deduce the following insights: Correlation between Sepal.Length and Petal.Length is strong and dense. Sepal.Length and Sepal.Width seems to show very little correlation as datapoints are spreaded through out the plot area. Petal.Length and Petal.Width also shows strong correlation. Note: The […]

Read More

Testing Spatial Equilibrium Concepts With tidycensus

Ignacio Sarmiento Barbieri walks us through the concept of spatial equilibrium and tests using data from the tidycensus package: Let’s take the model to the data and reproduce figures 2.1. and 2.2 of “Cities, Agglomeration, and Spatial Equilibrium”. The focus are two cities, Chicago and Boston. These cities are chosen because both differ in how easy […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930