Diamond: Solving Generalized Linear Models Using Python

Tim Sweester and Aaron Bradley announce Diamond, a Python library which solves certain kinds of generalized linear models.  In a two-part series, they explain more.  Part 1 covers the mathematical principles behind it:

Many computational problems in data science and statistics can be cast as convex problems. There are many advantages to doing so:

  • Convex problems have a unique global solution, i.e. there is one best answer
  • There are well-known, efficient, and reliable algorithms for finding it

One ubiquitous example of a convex problem in data science is finding the coefficients of an L2L2-regularized logistic regression model using maximum likelihood. In this post, we’ll talk about some basic algorithms for convex optimization, and discuss our attempts to make them scale up to the size of our models. Unlike many applications, the “scale” challenge we faced was not the number of observations, but the number of features in our datasets. First, let’s review the model we want to fit.

Part 2 looks at one interesting use case:

In this example, GLMMs allow you to pool information across different brands, while still learning individual effects for each brand. It breaks the problem into sets of fixed and random effects. The fixed effects are similar to what you would find in a traditional logistic regression model, while the random effects allow the regression relationship to vary for each brand. One of the advantages of GLMMs is that they learn how different brands are from each other. Brands that are very similar to the overall average will have small random effect estimates. Because of the regularization of these models, brands with few observations will also have small random effect estimates, and be treated more like the overall average. In contrast, for brands that are very different from the average, with lots of data to support that, GLMMs will learn large random effect estimates.

Check it out.  Part 2 also contains a link to the GitHub repo if you want to try it on your own.

Related Posts

Python versus R (Again)

Alex Woodie looks at whether Python is dominating R in the data science space: There is some evidence that Python’s popularity is hurting R usage. According to the TIOBE Index, Python is currently the third most popular language in the world, behind perennial heavyweights Java and C. From August 2018 to August 2019, Python usage surged […]

Read More

Z-Tests vs T-Tests

Stephanie Glen has a picture which explains the difference between a Z-test and a T-test: The following picture shows the differences between the Z Test and T Test. Not sure which one to use? Find out more here: T-Score vs. Z-Score. Click through for the picture.

Read More

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031