Data Cleansing: Hockey Edition

Stacia Varga has a post covering some of the yeoman’s work of data cleansing:

For now, Power BI continues to my tool of choice for my project. My goals for today’s post are two-fold: 1) finish my work to address missing venues in the games table and 2) to investigate the remaining anomalies in the games and scores tables as I noted in my last post.

To recap, I noted the following data values that warranted further investigation :

  • Total Goals minimum of 0 seems odd – because hockey games do not end in ties. I would expect a minimum of 0 so I need to determine why this number is appearing.

  • Total Goals maximum of 29 seems high – it implies that either one team really smoked the opposing team or that both teams scored highly. I’d like to see what those games look like and validate the accuracy.

  • Record Losses minimum of 0 seems odd also – that means at least one team has never had a losing season?

  • Similarly, Record Wins minimum of 0 means one team has never won?

  • Record OT minimum of 0 – I’m not sure how to interpret. I need to look.

  • Score minimum of 0 seems to imply the same thing as Total Goals minimum of 0, which I have already noted seems odd.

This is the kind of stuff that we talk about as taking 80-95% of a data science team’s time.  It’s all about finding “weird” looking values, investigating those values, and determining whether the input data really was correct or if there was an issue.

Related Posts

Substitution Variables In Power Query

Doug Burke gives us an example of here substitution variables make our lives easier in Power Query: A substitution variable substitutes a variable (a changing value) to get a different result     a + b = c (where ‘a’ and ‘b’ are substitution variables that define value ‘c’)         If a = 5 and b = 2 then c […]

Read More

Migrating Excel Power Pivot Models To SSAS

Imke Feldmann has a walkthrough to show how to migrate a Power Pivot model in Excel into SQL Server Analysis Services by way of Power BI: In Visual Studio there is a wizard to migrate an Excel Power Pivot model to a SSAS model. But this will not bring over the M-queries unfortunately. But there […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

April 2018
MTWTFSS
« Mar  
 1
2345678
9101112131415
16171819202122
23242526272829
30