Press "Enter" to skip to content

Month: January 2021

Parquet 1.x Footer Content

Dmitry Tolpeko shows us what the footer of a Parquet 1.x file looks like:

Every Parquet file has the footer that contains metadata information: schema, row groups and column statistics. The footer is located at the end of the file.

A parquet file content starts and ends with 4-byte PAR1 “magic” string. Right before the ending PAR1 there is 4-byte footer length size (little-endian encoding):

Click through for more details, as well as one downside to Parquet 1.x.

Comments closed

Automating Data Comparison using Biml

Ben Weissman gives us an example of running data comparison using Biml:

When you build an ETL Solution, at some point you will most likely feel the need to compare the data between your source and your staging (or datawarehouse) database. There may be various reasons for them to be out of sync, such as delta loads, aggregations or added business logic, but one day your phone will ring and whoever is on the other end will tell you that the numbers are wrong. While this doesn’t necessarily have to be an issue within your ETL process, it might just as well be the report itself. In many cases, this is a good starting point to look at.

This article (formally published on PASS Community Blog) focusses on SSIS as your orchestrator, but the same principles could, obviously, also be applied to Azure Data Factory for example. Also, as we want to solve this task as lightweight as possible, we will, demonstrably, use Biml to implement it!

Click through to see how.

Comments closed

One Chart at a Time

Meagan Longoria has contributed to a series:

Jon Schwabish over at PolicyViz has created great initiative called the One Chart at a Time Video Series. It’s an effort to expand readers’ graphic literacy through short videos explaining how to read and use different charts. Each video is from a different person in the data visualization industry. Participants include people I admire such as Andy KirkBen Jones, and Cole Nussbaumer Knaflic. Jon releases a new video each weekday. The initiative started January 11 and will continue through mid-March.

Click through for Meagan’s contribution on waterfall charts.

Comments closed

Using Hints Instead of Trace Flags

Taiob Ali has a set of hints to use instead of setting trace flags:

Introduced in SQL Server 2016 (13.x) SP1 and Azure SQL Database is the USE HINT option. At the time of writing this article, there are 23 of those. You can see an up to date list by running Dynamic Management View sys.dm_exec_valid_use_hints.

With the introduction of this feature, we can replace some of the trace flags with hints. I personally like hints over trace flags (if I have to choose between two devils). That way someone reading the code have some idea what the hint is about instead of remembering the trace flag numbers.

I agree with Taiob on that sentiment.

Comments closed

The Performance of sys.partitions

Aaron Bertrand tries to disentangle a query:

This question was posted to #sqlhelp by Jake Manske, and it was brought to my attention by Erik Darling.

I don’t recall ever having a performance issue with sys.partitions. My initial thought (echoed by Joey D’Antoni) was that a filter on the data_compression column should avoid the redundant scan, and reduce query runtime by about half. However, this predicate doesn’t get pushed down, and the reason why takes a bit of unpacking.

Click through for the results of Aaron’s research.

Comments closed

Threadpool Waits

Chad Callihan recommends not messing with worker threads:

Our story begins on a test server hosting a couple hundred databases. Over time, the server kept getting slower and slower when trying to navigate in SSMS. Query windows would take too much time to load, logging in would lag a bit, etc. When investigating with sp_Blitz, I found that the server had been experiencing Threadpool waits.

There are some cases where increasing thread count is important, especially when you’re working with database mirroring or availability groups. I worked with a customer with thousands of mirrored databases per server. None of the databases were particularly large or heavily-used, so it was on properly-sized hardware. As a result, to prevent the server from falling over due to threadpool waits, we had to scale thread counts to scary-high levels.

Comments closed

SSIS Framework Manager Community Edition

Andy Leonard has a new product announcement:

I’m excited to announce SSIS Framework Manager CE (Community Edition) is available for download at DILM Suite! SSIS Framework Manager CE is designed to support SSIS Framework Community Edition, providing a GUI to facilitate SSIS Framework Application creation, configuration, and management.

Three views are supported in this initial edition: Catalog, Application, and Package. The Catalog view incorporates the same Catalog treeview used in SSIS Catalog Browser (also free) and SSIS Catalog Compare (not free):

Click through to see what’s included.

Comments closed

Debugging DAX via Tooltips

Matt Allington has a workaround for us:

DAX is a tricky language; on the surface it is deceptively simple, but under the hood it can quickly become complex and it can take many years to master. If you have ever typed a formula and crossed your fingers when you press Enter, then you know what I mean. If you are reading this article, you are no doubt already on your own DAX learning journey.

As you become more competent at DAX, you will start to write more complex formulas that behave differently depending on the filters in your visuals.  One such example is the P&L report that I shared in 2020.  This report contains quite a tricky formula. It was built slowly and methodically, one step at a time, until it was working as desired.  This is the best way to write DAX.  If you want to see how I do it, then go back to the article linked above and watch the video.  While I recommend this as the best approach to writing formulas, the reality is that sometimes you will need to go back to a formula you wrote previously and debug that formula.  Maybe it is not working correctly, or maybe you need to enhance it for some reason.  Whatever the reason, today I am sharing with you my technique to debug complex formulas using tool tips.

Read the whole thing.

Comments closed

Faster String Concatenation with SQL Server

Steve Stedman has some tips for people who need to combine strings:

What the previous show as that the longer the string gets the slower the concatenation is. So instead we declare a second VARCHAR(MAX) variable called @stringBuilder, and each time through the loop we concatenate to that, then every thousandth time through the loop we take the @stringBuilder variable and concatenate it on to the @bigString, then clear out the @stringBuilder variable. This keeps the @stringBuilder variable relatively short, and reduces the number of concatenations to the @bigString to roughly 1/1000th the original.

Click through for a demo of the process. I don’t think I’ve been in too many situations where string concatenation was a performance killer in SQL Server, but it’s good to know.

Comments closed