Joining On NULL

Erik Darling has opened a can of worms here:

WITH ALL THE TROUBLE NULLS CAUSE…

You’d think people would be more inclined to avoid them. Slap a NOT NULL constraint and a default value on your column and call it a day. I’m perfectly fine with bizarro world canary values. If it’s an integer column, some really high (low?) negative number. If it’s date-based, why not have it be the lowest value your choice accomodates?

Check out the comments, definitely.  I don’t think it’s as clear-cut as Erik argues; the idea of NULL has been and will remain controversial because it’s a useful concept but one which requires explicit consideration.

Related Posts

Thinking About The Data Lake

James Serra explains at a high level what the data lake metaphor is and how it works: The data lake introduces a new data analysis paradigm shift: OLD WAY: Structure -> Ingest -> Analyze NEW WAY: Ingest -> Analyze -> Structure This allows you to avoid a lot of up-front work before you are able […]

Read More

Re-Shaping Data Flows

Maneesh Varshney explains some methods to trim the fat out of analytical data flows: Big data comes in a variety of shapes. The Extract-Transform-Load (ETL) workflows are more or less stripe-shaped (left panel in the figure above) and produce an output of a similar size to the input. Reporting workflows are funnel-shaped (middle panel in […]

Read More

Categories

March 2016
MTWTFSS
« Feb Apr »
 123456
78910111213
14151617181920
21222324252627
28293031