In the last post, we focused on high-level aggregates to gain a basic understanding of our data. We saw some suspicious results but couldn’t say much more than “This looks weird” due to our level of aggregation. In this post, I want to dig into data at a lower level of detail. My working conception is the cohort, a broad-based comparison of data sliced by some business-relevant or analysis-relevant component.
Those familiar with Kimball-style data warehousing already understand where I’m going with this. In the basic analysis, we essentially look at fact data with a little bit of disaggregation, such as looking at data by year. In this analysis, we introduce dimensions (sort of) and slice our data by dimensions.
Click through for some fraud-finding fun.