For the sake of simplicity, I’ll assume that our example data has a clustered index on the “primary key” that we want to use to determine which rows have already been included in the set – in our example, the primary key is (FirstName, LastName). The data I’m using is a fictional example, but here’s the jist:
- #Employees has about 33 000 rows.
- #Customers has about 44 000 rows.
- #Passengers has about 500 000 rows.
The data is constructed in a way that these queries should return 530 000 rows, so we’re looking at some overlap but far from totally overlapping rows.
Example #492,806 that T-SQL is not a true 4th Generation Language, that how you write the query can greatly matter for performance.