Building Data Dictionaries

Caitlin Hudon shares some great advice on building data dictionaries:

The best defense I’ve found against relying on an oral history is creating a written one.

Enter the data dictionary. A data dictionary is a “centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format”, and provides us with a framework to store and share all of the institutional knowledge we have about our data.

As part of my role as a lead data scientist as a start-up, building a data dictionary was one of the first tasks I took on (started during my first week on the job). Learning about our data is a crucial part of onboarding for data-focused roles, and documenting that journey in the form of a data dictionary provides a useful data asset for the company (which helps to preserve institutional knowledge) and simultaneously provides a good resource for analyzing the data.

Read the whole thing.

Related Posts

Data Layout in R with cdata

John Mount takes us through a few sample problems and how to reshape data with cdata: This may seem like a lot of steps, but it is only because we are taking the problems very slowly. The important point is that we want to minimize additional problem solving when applying the cdata methodology. Usually when you need to […]

Read More

Data Modeling: Not Everything Is VARCHAR(8000)

Bert Wagner gives us several reasons why VARCHAR(8000) can be a bad fit for data types: When first learning SQL Server, I had the brilliant idea of defining every column on my table as VARCHAR(8000). After all, the VARCHAR datatype only consumes as much storage space as the string itself (plus two bytes), so even if […]

Read More

Categories

October 2018
MTWTFSS
« Sep Nov »
1234567
891011121314
15161718192021
22232425262728
293031