Prioritizing Rows In A Union

Daniel Hutmacher shows several ways to combine data from multiple sources, prioritized by source:

For the sake of simplicity, I’ll assume that our example data has a clustered index on the “primary key” that we want to use to determine which rows have already been included in the set – in our example, the primary key is (FirstName, LastName). The data I’m using is a fictional example, but here’s the jist:

  • #Employees has about 33 000 rows.
  • #Customers has about 44 000 rows.
  • #Passengers has about 500 000 rows.

The data is constructed in a way that these queries should return 530 000 rows, so we’re looking at some overlap but far from totally overlapping rows.

Example #492,806 that T-SQL is not a true 4th Generation Language, that how you write the query can greatly matter for performance.

Related Posts

Faster Scalar Functions In SQL Server 2019

Brent Ozar looks at improvements the SQL Server team has made to scalar functions in 2019: My database has to be in 2019 compat mode to enable Froid, the function-inlining magic. Run the same query again, and the metrics are wildly different: Runtime: 4 seconds CPU time: 4 seconds Logical reads: 3,247,991 (which still sounds bad, […]

Read More

Uncovering Complexity In SQL Objects

Michael J. Swart helps us uncover hidden complexity in database objects: The other day, Erin Stellato asked a question on twitter about the value of nested SPs. Here’s how I weighed in: I’m not a fan of nested anything. Too much hidden complexity. Code reusability leads to queries that are jack of all trades, master […]

Read More

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31