Aaron Bertrand has a two-parter. Part one sets up the problem:
We often build logging or other insert-only tables where we store large strings like URLs, host names, or error messages. It’s usually not until the table has terabytes of data that we realize there might have been a better way. If we are logging traffic or exceptions for our own application, it’s likely that we record the same URL, host name, or error message on millions of rows. What if we only had to write that URL or host name or message text once, the first time we saw it? In this tip, I want to share one idea for abstracting away recurring values, reducing storage, and making search queries faster (especially those with wildcards) without requiring immediate changes in the application layer.
Part two maximizes the savings:
In my previous tip, I showed how we can make a growing logging table leaner by moving large, repeating strings to their own dimension tables. The solution there involved an AFTER INSERT trigger and assumed that we could change the applications to recognize the new table structure in relatively short order.
Check out both posts for more details. If you’re confused about my calling this 6NF and Aaron mentioning dimension tables, the answer is that he’s talking about the end result and I’m describing the process.
Comments closed