Building A Model: Lumping And Splitting

Anna Schneider and Alex Smolyanskaya explain some of the tradeoffs between lumping groups together and splitting them out when it comes to algorithm selection:

At Stitch Fix, individual personalization is in our DNA: every client is unique and every piece of clothing we send is chosen to be just right. When we buy merchandise, we could choose to lump clients together; algorithms trained on lumped data would steer us toward that little black dress or those comfy leggings that delight a core, modal group of clients. Yet when we split clients into narrower segments and focus on the tails of the distribution, the algorithms have the chance to also tease out that sleek pinstripe blazer or that pair of distressed teal jeans that aren’t right for everyone, but just right for someone. As long as we don’t split our clients so finely that we’re in danger of overfitting, and as long as humans can still understand the algorithm’s recommendations, splitting is the way to go.

In other cases lumping can provide action-oriented clarity for human decision-makers. For example, we might lump clients into larger groups when reporting on business growth for a crisp understanding of holistic business health, even if our models forecast that growth at the level of finer client splits.

Read on and check out their useful chart for figuring out whether lumping or splitting is the better idea for you.

Related Posts

Data Science And Data Engineering In HDP 3.0

Saumitra Buragohain, et al, show off some of the things added to the Hortonworks Data Platform for data scientists and data engineers: We leverage the power of HDP 3.0 from efficient storage (erasure coding), GPU pooling to containerized TensorFlow and Zeppelin to enable this use case. We will the save the details for a different […]

Read More

Multi-Threaded R With Microsoft R Client

David Parr shows us how to get started with Microsoft R Client and performs some quick benchmarking: This message will pop up, and it’s worth noting as it’s got some information in it that you might need to think about: It’s worth noting that right now Microsoft r Client is lagging behind the current R version, and […]

Read More

Categories

April 2018
MTWTFSS
« Mar May »
 1
2345678
9101112131415
16171819202122
23242526272829
30