The concept of “wide data” is relative. In some domains 100 columns is considered “wide”, while in others that’s perfectly normal and you’d need to have thousands (or tens of thousands!) of columns for it to be considered even remotely “wide”. The data that we work with at Fathom Data generally lies in the first domain, but from time to time we do work on data that is considerably wider.
This post touches on a couple of approaches for dealing with that sort of data. We’ll be using some HCRIS (Healthcare Cost Report Information System) data, which are available for download here. Specifically, we’ll be working with an extract from the
hcris2552_10_2017.csvfile, which contains “select variables in flat shape”.
Click through for one example which has 1700 columns. H/T R-Bloggers.