Press "Enter" to skip to content

Curated SQL Posts

Drawing SSIS Packages

Bartosz Ratajczyk continues a quest to draw SSIS packages as SVGs:

To get the Value and Expression properties I need to find the precedence constraint in the .dtsx file during the XSL transformations. It requires three changes in the package2svg.xsl:

– I have to pass the name of the .dtsx file
– I have to read the XML from the .dtsx file
– I have to use the DTS namespace because it’s the namespace of the .dtsx file

Read on for more. Bartosz to this point has covered the control flow.

Comments closed

Tips for Reading Execution Plans

Bert Wagner gives us some tips for reading execution plans in SQL Server:

Execution plans show the steps SQL Server takes to execute your query. Each icon in the graphical execution plan is known as an operator, and the most common way to read a plan is by starting with the top right most operator and following the arrows to the left.

When you reach a join or concatenation operator where multiple branches merge into one operator, you can proceed to the right-most operator of one of the lower branches and start the process of reading right to left again. In general, this can be summed up as reading a plan right to left, top to bottom.

Right-to-left, top-to-bottom gives you the flow of information and that’s quite important. To understand how the engine works, though, you also need to read left-to-right, as Brad Schulz’s outstanding one-act play demonstrates.

Comments closed

Another Look at Cosmos DB Indexing

Hasan Savran revises some indexing recommendations based on changes to Cosmos DB:

Lazy indexing used to be an option. It’s not in any CosmosDB documentation anymore. By using Lazy indexing, you could save 20 to 30 percent for Request Units. Just like anything else in life, you get what you pay for when it comes to Lazy indexing. By selecting Lazy indexing, you are saying that eventually Indexes will be updated. If Indexes are not updated, that means your queries might not return all the data since all data might not be indexed yet. Lazy indexing is still an option, nobody talks about it for a good reason. In my opinion, it should be listed as obsolete feature or it should have a better documentation about how it works or why it might not be a good option for your solutions.

     If you use Lazy Indexing to reduce Request Units in your solution, change it to consistent now unless you have a really good reason!

Read on for more advice in this vein.

Comments closed

SQL Server and String Concatenation

Slava Murygin walks us through the rules around string concatenation with the + operator in SQL Server:

Pretty recently hit very interesting problem of how SQL Server concatenates strings.

The very same SELECT clause in some cases produced VARCHAR(MAX), but in same cases it cut the result to VARCHAR(8000) ( or NVARCHAR(4000) )

After a research I’ve discovered following:

Read on to see what Slava has discovered.

Comments closed

Version Control and Power BI Desktop

Gilbert Quevauvilliers takes us through version control with PBIX files:

In the second part of my blog post I am going to detail how to use the version control with Power BI Desktop files.

This will include adding files, checking files in and out, viewing previous versions and reverting to previous versions.

If this is the first time you are reading this blog post, I would highly suggest reading Setting up Version Control for my Power BI Desktop Files (PBIX) with no additional Cost * | Part 1

In short, Gilbert treats PBIX file as any other data file. These can get kind of beefy, though, so I’ve also saved them as templates—that way, you get the structure without pulling in all of the data.

Comments closed

Dimensional Load with Databricks

Leo Furlong shows how we can load an Azure SQL Data Warehouse dimension with Databricks:

Ingesting data into the Data Lake occurs in steps 1 and 2 in our architecture.  Azure Data Factory (ADF) provides an excellent mechanism for loading data from source applications into a Data Lake stored in Azure Data Lake Store Gen2.  In fact, Microsoft offers a template in the ADF Template gallery which provides a metadata driven approach for doing so.  The template comes with a control table example in a SQL Server Database, a data source dataset and a data destination dataset.  More on this template can be found here in the official documentation.

I appreciate that this is a full walkthrough of the process, not just one step.

Comments closed

Stream Processing with Kafka

Satish Sharma wraps up a series on Kafka and Kafka Streams:

Consider a hypothetical fleet management company that needs a dashboard to get the insight of its day to day activities related to vehicles. Each vehicle in this fleet management company is fitted with a GPS based geolocation emitter, which emits location data containing the following information

1. Vehicle Id: A unique id is given to each vehicle on registration with the company.
2. Latitude and Longitude: geolocation information of vehicle.
3. Availability: The value of this field signifies whether the vehicle is available to take a booking or not. Current Status (Online/Offline) denotes whether the vehicle is on duty or not.

Read through the article and then check out Satish’s GitHub repo for more.

Comments closed

The Cost of Abstraction

Erik Darling explains that abstraction can be the cause of performance woes in SQL Server:

There’s no “caching” of steps in a query. If you nest a view however-many-levels-deep, each step isn’t magically materialized.

Same goes for CTEs. If you string a bunch together and reference them multiple times, you’ll start to see some very repetitive branches in your query plans.

Now, there are tricks you can play to get what happens inside of one of these steps “fenced off”, but not to get the result set fully materialized.

It’s a logical separation, not a physical one.

In addition, as your query gets more and more complex, the optimizer eventually gives up and gives you what will likely be an ugly version of its implementation because there are too many potential solutions.

Comments closed

Using Show-Command with dbatools

Garry Bargsley shows how you can “build” a GUI for any Powershell cmdlet, using one from dbatools as an example:

The Show-Command cmdlet will display in a graphical interface all the parameter options related to a function, cmdlet or workflow.

I always recommend the book “Learn Windows PowerShell in a Month of Lunches” for people that ask me how to get started with PowerShell.  It is a very thorough book and I believe helps build the foundation you need to user PowerShell to the fullest.  Earlier this month I spent time going through the book for maybe the third time.  I must have blown right by the Show-Command cmdlet which shows up in Chapter 4, as I do not ever remembering seeing it before.  Then a couple of weeks ago Kenneth Fisher ( b | t ) post a blog about this command.  I like how Kenneth related the Show-Command in PowerShell to generating scripts with SSMS.  This is a great comparison for the DBA to understand.  After reading his blog post I thought, wow, this might be a handy tool to combine with my favorite module out there dbatools.

Click through for a demonstration.

Comments closed

Testing Memory-Optimized tempdb

Erin Stellato wants to see how big a benefit using In-Memory OLTP for tempdb metadata objects really is:

I have SQL Server 2019 CTP 3.2 installed on my virtual machine, which has 8GB of memory (max server memory set to 6 GB) and 4 vCPUs. I created four (4) tempdb data files, each sized to 1GB.

I restored a copy of WideWorldImporters and then created three stored procedures (definitions below). Each stored procedure accepts a date input, and pushes all rows from Sales.Order and Sales.OrderLines for that date into the temporary object. In Sales.usp_OrderInfoTV the object is a table variable, in Sales.usp_OrderInfoTT the object is a temporary table defined via SELECT … INTO with a nonclustered added afterwards, and in Sales.usp_OrderInfoTTALT the object is a pre-defined temporary table which is then altered to have an additional column. After the data is added to the temporary object, there is a SELECT statement against the object that joins to the Sales.Customers table.

This isn’t a be-all, end-all performance test, but it does give us an idea of what can work better with memory-optimized tempdb metadata.

Comments closed