Ben Weissman has a two-part series on loading a set of tables based on foreign key constraints. Part 1 is linear loads:
All our previous posts were running data loads in parallel, ignoring potential foreign key constraints. But in real life scenarios, your datawarehouse may actually have tables refering to each other using such, meaning that it is crucial to create and populate them in the right order.
In this blog post, we’ll actually solve 2 issues at once: We’ll provide a list of tables, will then identify any tables that our listed tables rely on (recursively) and will then create and load them in the right order.
In this sample, we’ll use AdventureWorksDW2014 as our source and transfer the FactInternetSales-table as well as all tables it is connected to through foreign key constraints. Eventually, we will create all these tables including the constraints in a new database AdventureWorksDW2014_SalesOnly (sorting them so we get no foreign key violations) and eventually populate them with data.
After the first excitment about how easy it actually was to take care of that topology, you might ask yourself: Why does it have to run linear? That takes way too long. And you’re right – and it doesn’t have to.
All we need to do is:
– Create a list of all the tables that we’ve already loaded (which will be empty at that point)
– Identify all tables that do not reference any other tables
– Load these tables, each followed by all tables that only reference this single table – recursively and add them to list of loaded tables
– Once that is done, load all tables that are referencing multiple tables where all required tables have been loaded before – and again, add them to the list
– Repeat this until no table is left to load (or for a maximum of 10 times in this example)
– If, for whichever reason, any tables are left, load them sequentially using the TopoSort function:
This is a very interesting way of using Biml to traverse the foreign key tree. I’ve normally used recursive CTEs in T-SQL to do the same, but I’ll have to play around with this method.
There are two Windows Base images on the Docker Hub – microsoft/nanoserver andmicrosoft/windowsservercore. We’ll be using an IIS image shortly, but you should start with Nano Server just to make sure all is well – it’s a 250MB download, compared to 4GB for Server Core.
docker pull microsoft/nanoserver
Check the output and if all is well, you can run an interactive container, firing up PowerShell in a Nano Server container:
Docker will also run on Windows 10 Pro, Enterprise, or Education editions. That’s sad news for people who upgraded for free to Home Edition.
At this point, you might be wondering what combination of tiny memory grant and enormous data size could possibly result in a level 15,000 sort spill. Trying to sort the entire Internet in 1MB of memory? Possibly, but that is way too hard to demo. To be honest, I have no idea if such a genuinely high spill level is even possible in SQL Server. The goal here (a cheat, for sure) is to get SQL Server to report a level 15,000 spill.
The key ingredient is partitioning. Since SQL Server 2012, we have been allowed a (convenient) maximum of 15,000 partitions per object (support for 15,000 partitions is also available on 2008 SP2 and 2008 R2 SP1, but you have to enable it manually per database, and be aware of all the caveats).
Read the whole thing.
You start with a blank sheet,
three nuts and a bolt,
a strong sense of fairness,
a large can of Jolt.
And you try to imagine,
as best as you’re able
a rulerless kingdom
that won’t grow unstable.
That’s what happens when you dig into internals for too long.
Following on from my last post Changing connection colours in SSMS I thought I’d write another quick about this cool but also often unused feature in SSMS.
These shortcuts allow you to run pre-determined queries by assigning a hot key within SSMS. To do this in SSMS go to Tools > Options > Environment > Keyboard
I love query shortcuts. I have three dedicated to different sp_whoisactive commands (one to get everything going on; one to get everything related to my username; and one to get everything going on plus query plans, which I don’t always use because of the additional overhead).
I typically look at the ‘CPU time’ metric when tuning instead of ‘elapsed time’ (duration). This can work well for tuning because you’re measuring how much more efficient you made the query in terms of CPU cycles.
But ‘CPU time’ isn’t perfect, and it can get a little weird for reporting results to users, because:
If the query uses parallelism, CPU time can be higher than the duration — which may make the query seem “slower” than it actually is to anyone reading a report
‘elapsed time’ includes all the time that it takes to display the results in Management Studio, which is probably a different duration than it would take to return the results to an application server. If you’re just returning a few rows, this may be negligible– but once it gets into the thousands of rows, it can be very noticeable.
Moral of the story: also use SQL Sentry Plan Explorer…
Works and looks similar to a Stacked Area Chart but with a wiggle feature that gives it a more fluid look and feel
Great for displaying data that changes over time
At first, I read this as “Steam Graph,” which made it sound like a steampunk visualization with unnecessary pipes and mechanical accouterments, but alas, it was not meant to be. I do like the stream graph visual, though.
There is also considerable overhead in terms of memory usage. Every non-empty off-row value adds 50+ bytes of the overhead regardless of its size. Those 50+ bytes consist of three artificial ID values (in-row, off-row in data row and leaf-level of the range index) and off-row data row structure. It is even larger in case of LOB columns where data is stored in LOB Page Allocator.
One of the key points to remember that decision which columns go off-row is made based on the table schema. This is very different from on-disk tables, where such decision is made on per-row basis and depends on the data row size. With on-disk tables, data is stored in row when it fits on the data page.
In-Memory OLTP works in the different way. (Max) columns are always stored off-row. For other columns, if the data row size in the table definition can exceed 8,060 bytes, SQL Server pushes largest variable-length column(s) off-row. Again, it does not depend on amount of the data you store there.
This is a great article getting into the internals of how memory-optimized tables work in SQL Server 2016, as well as a solid reason to avoid LOB types and and very large VARCHAR values on memory-optimized tables if you can. Absolutely worth a read.
SSDT helps us to refactor code by automating the actions of:
- Expanding wildcards
- Fully qualifying object names
- Moving objects to a different schema
- Renaming objects
Aside from this list SSDT also, of course, helps us to refactor code manually with its general editing facilities.
If you aren’t familiar with what SSDT can do, check out this article.