Author: Kevin Feasel

Actual I/O Statistics in Execution Plans

Published 2020-06-09 by Kevin Feasel

Hugo Kornelis talks about a fairly recent property in execution plans:

There are two operators that read from the SalesOrderDetail table (or from indexes on that table). The top left operator is an Index Seek on one of the nonclustered indexes on SalesOrderDetail, and on the bottom input of the Nested Loops operator is a Clustered Index Scan that scans the clustered index on the same table.
So, now what? Which of the two is in this case the problem? Is each doing exactly 625 logical reads? Is one doing 50 and the other 1200? For the longest time, there was no way to find out. Sometimes you could make an educated guess by looking at the rest of the execution plan. Sometimes you can get an idea by running other queries with similar plans and check their logical reads (like in this case, you could run the subquery by itself and that would work). But none of these methods are really satisfactory.

Read on to see how the SQL Server team has addressed this.

Comments closed

Organizing and Optimizing Power BI Dataflows

Published 2020-06-09 by Kevin Feasel

Marc Lelijveld wants your Power BI dataflows in tip-top shape:

In this blog I describe a few of the challenges you might face when you have a lot of dataflows. I will describe a few tips and tricks I am applying to sanitize your dataflow approach, organize dataflows and easy to browse through.
Dataflows are increasingly used as shared resource or staging layer inside the Power BI platform. While having dataflows, you can push down logic and reuse across different datasets. This lowers the impact on the source by extracting the data once from source to Power BI, helps in centralizing logic, having one version of the truth and lots of other advantages.

Read on for some tips and practices.

Comments closed

Contrasting Azure Storage Accounts against Azure Managed Storage

Published 2020-06-09 by Kevin Feasel

Denny Cherry explains the difference between Azure Storage Accounts and Azure Managed Storage:

Microsoft Azure has two different kinds of storage available, Storage Accounts and Managed Storage. Each of these has its use, and with one exception can’t really be interchanged between each other.

Read on to learn how each is useful in its own way.

Comments closed

Evolutionary Algorithms for Color Palette Discovery

Published 2020-06-08 by Kevin Feasel

Daniel Oehm combines two interests:

Colour theory is pretty complex stuff so choosing a good palette isn’t easy, let alone evolving one. So, you’re going to have some hits and some misses. This is definitely more for fun seeing what you discover rather than finding the perfect palette. Having said that you could discover some gold!
There are best practices when choosing a palette for data visualisation depending on the context and what is to be shown. For example people tend to respond to certain colours representing high / low, hot / cold or good / bad, there is also colourblindness considerations. evoPalette won’t necessarily adhere to these ideals.

I’d like to see a genetic algorithms approach, though you’d have to define some sort of function to score each outcome, so I can see how that’d be tricky. H/T R-Bloggers

Comments closed

Understanding Monads in Scala

Published 2020-06-08 by Kevin Feasel

Anna Wykes continues a series on Scala for data engineers:

This is the second of my blogs in the Scala Parlour Series, in which we explore Scala, and why it is great for Data Engineering. If you haven’t already, please check out the first in the series here, in which you can read all about the core concepts of Scala, including who uses it and why
In this article we will explore monads within the Functional Programming (FP) paradigm, and how they can be used in Scala to aid Data Engineering.

Anna explains monads quite well here. This is a topic which is notoriously in how people perceive its difficulty, but conceptually it’s not as difficult as people take it to mean…if you understand a few concepts coming in.

Comments closed

E-mailing Query Results via Logic Apps + Azure Data Factory

Published 2020-06-08 by Kevin Feasel

Rayis Imayev has to send an e-mail:

It is a very simple concept, you have a database query result, either a direct database table output or a stored procedure returned result that needs to be sent as email notification. This email could serve as a status report of your daily data processing job or alert notification with some metrics that require immediate attention, and you want to be as user-friendly as possible with this message aka HTML format for your dataset is required.
I’ve built this solution approach over a year go and wanted to document and share it after recently using it in one of my projects. Again, I agree, there other more or less effective ways to achieve the very same goal, this blog post is just a recollection of my efforts to send HTML formatted SQL query results using Azure Data Factory and Logic App.

It’s a bit more effort than sp_send_dbmail, though also considerably more flexible.

Comments closed

Building a Docker Container of a SQL Server Database

Published 2020-06-08 by Kevin Feasel

I have a post showing how to turn a database in SQL Server into a Docker container:

Today, we’re going to go through the process of turning a database you’ve built into a Docker container. Before we get started, here are the expectations:
1. I want a fully running copy of SQL Server with whatever database I’m using, as well as key components installed.
2. I want this not to be on a persistent volume. In other words, when I destroy the container and create a new one from my image, I want to reset back to the original state. I’m using this for technical demos, where I want to be at the same starting point each time.
3. I want this to be as easy as possible for users of my container. I consider the use of a container here as not particularly noteworthy in and of itself, so the more time I make people think trying to set up my demo environment, the more likely it is that people will simply give up.
With that preamble aside, let’s get to work!

As a bonus, you can finally learn my real thoughts on medieval France. Fun story around that: a much longer time ago than I’m willing to admit, I played a Hundred Years War scenario in Civilization 2, and the one thing I remember from that scenario is killing the Dauphin. After that, the script spawned a new claimant to the throne, who immediately attacked my troops and died. And then the script spawned yet another new claimant, who met the same fate within a couple turns. And then a third. If I remember correctly, I ran France out of claimants to the throne by the end of it.

Comments closed

Understanding Conditions Under Which Query Store Captures Queries

Published 2020-06-08 by Kevin Feasel

Milos Radivojevic dives into testing Query Store:

From SQL Server 2019, you can use the fourth option – CUSTOM which allows you to define what does „relevant query“ mean. For that purpose, you can use three parameters. A query will be captured by Query Store if it is executed at least EXECUTION_COUNT times or if for its compilation SQL Server used at least TOTAL_COMPILE_CPU_TIME_MS or the entire query execution used at least TOTAL_EXECUTION_CPU_TIME_MS CPU. All three parameters are evaluated within a period defined by using the fourth parameter – STALE_CAPTURE_POLICY_THRESHOLD with a default value of 1 hour.
You cannot define exceptions, i.e. queries that you do not want to see captured by Query Store, such as system queries or queries related to Query Store itself, you can just filter them out when you write queries against Query Store’s catalog views.
As mentioned earlier, in this post, I will go in details for the ALL option. This is default option in SQL Server 2019, and the decision to change from ALL to AUTO was a good one. However, the lack of documentation for this option can lead to mistrust and misunderstandings. Defaults are important for the feature’s newcomers; they will use defaults until they learn and test enough to probably choose better options for their workload.

There’s a lot of depth in this post, and Milos wraps up with a few rules of thumb.

Comments closed

An Overview of the T-SQL Script DOM

Published 2020-06-08 by Kevin Feasel

Dan Guzman provides a public service:

Scripts are parsed by invoking the Parse method of T-SQL script DOM library TSqlParser class. The parser understands the complex T-SQL abstract syntax tree and splits T-SQL source into atomic TSqlParserTokens of TSqlTokenTypes that represent keywords, identifiers, punctuation, literals, whitespace, etc. These low-level tokens are grouped into more meaningful TSqlFragm ent objects that represent language elements of the script DOM, such as batches, statements, clauses, etc. Fragments, rather than the low-level parser tokens, are most often used in practice, although the underlying tokens are available for specialized requirements
The Parse method returns a TSqlFragment object of type TSqlScript containing all fragments within the script. This top-level fragment of the DOM hierarchy provides programmatic access to all language element fragments in the script. Nearly 1,000 different fragment types exist today due to the many granular T-SQL language elements.

Dan provides several examples of how to use the script DOM, making this a must-read if you’re interested in writing code around SQL Server.

Comments closed

Optimizing Power BI Merge Performance with Table.Join

Published 2020-06-08 by Kevin Feasel

Chris Webb shows us another way to optimize Power BI merge performance:

The SortMerge algorithm, last in the list above, is the focus of this blog post. I mentioned in my earlier posts that the reason that merge operations on non-foldable data sources are often slow is that both of the tables used in the merge need to be held in memory. There is an exception though: if you know that the data in the columns used to join the two tables is sorted in ascending order, you can use the Table.Join function and the SortMerge algorithm and the data from both sources can be streamed rather than held in memory, which in turn results in the merge being much faster.

That’s the same in the relational world: merge joins are the fastest, assuming that your data is pre-sorted in the proper manner.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31