Curated SQL – Page 1001 – A Fine Slice Of SQL Server

Rounding to Intervals in Various Languages

Published 2019-08-19 by Kevin Feasel

Dave Mason doesn’t like rounding to intervals very much in T-SQL:

If I had to choose any of these options for production, I’d probably go with Query 1, just because I think it would make the most sense to any other developer that might encounter it (including future me, who probably wouldn’t remember writing the code). But I’m not really thrilled with any of the options. The one thing T-SQL has going for it though, is the relative ease for truncating the TIME off of a DATETIME by casting to DATE and back to DATETIME. I couldn’t find anything like this in DAX.

Dave also shows how to do this in DAX and Powershell.

Another alternative that Dave doesn’t mention is to invert the problem: if you have a fixed set of intervals you care about (e.g., 15-minute, 30-minute, hour, 4-hour, etc.), you can create a time table. This is like a date table but contains times of the day where you’ve precalculated the intervals. Then you join to the time table and have your results right there. If you do go this route, I’d try to keep the grain of the time table as shallow as possible, maybe using DATETIME2(0) instead of DATETIME2(7).

Comments closed

The Folly of Aggregation, Wait Stats Edition

Published 2019-08-19 by Kevin Feasel

Erik Darling explains why simply looking at wait stats isn’t enough:

I’m going to be honest with you, dear readers. I’ve fallen out of like with wait stats.
Unless there’s a massive resource or blocking problem, they’re a bit bland.
Tuning a single query, I don’t look at them at all.
Sure, they can be a good detail when you’re painting an overall picture of a server, but they’re misleading a lot.

Erik makes a sound point. It’s usually a sound point when it reminds me of the Hayek quotation, “Mr. Keynes’s aggregates conceal the most fundamental mechanisms of change.”

Comments closed

Power Query and the Benefits of Immutability

Published 2019-08-19 by Kevin Feasel

Chris Webb explains why immutable expressions can be faster to run multiple times than mutable processes:

Instead of taking the value #”Sorted Rows”[Column2]{0} and storing it in the variable Column2 then adding Column2 four times, I’m adding the expression #”Sorted Rows”[Column2]{0} together four times. The query returns the same number as the previous query. However this query takes 20 seconds to run! Why?

Read on for the explanation.

Comments closed

Optimize For Sequential Key

Published 2019-08-19 by Kevin Feasel

Pam Lahoud explains the context behind a new option you can add to indexes in SQL Server 2019 CTP 3.1 and later:

With last page insert contention, as the number of insert threads increases, the queue for the page latch increases which in turn increases latency. Throughput will also decrease, but if something slows down one of the threads that is holding the latch, this can trigger a convoy and throughput suddenly falls off a cliff. This typically happens when a page fills up and a new page must be added to the index (also known as a page split). The insert that triggers the new page will naturally have to hold the latch for longer than normal while the new page operation completes. This causes the queue to build up behind the latch. Adding a new page also requires an exclusive latch on the parent page, which can cause latch requests to queue at that level as well. At this point, throughput falls off a cliff.
OPTIMIZE_FOR_SEQUENTIAL_KEY aims to do two things – control the rate at which new threads are allowed to request the latch, and favor threads that are likely to keep the throughput high. These techniques will not prevent the contention or reduce latency, but they will help keep throughput consistent as concurrency increases.

Read on to learn more about the process and by implication some places where this won’t work well at all for you.

Comments closed

Merge Replication on Linux

Published 2019-08-19 by Kevin Feasel

Jignesh Raiyani shows how you can set up merge replication on SQL Server on Linux:

In this article, we will discuss to deploy SQL Server Merge Replication on Linux environment. Many SQL Server features are not available on Linux by Microsoft but the Replication feature exists for Linux Environment Edition. Before starting anything, let’s address this question – what is Merge Replication?
Merge Replication is a data synchronization process with one database (Publisher) to other databases (Subscriber) and vice versa. The data synchronization audit will be controlled by the distributor (Distribution database). The distributor database will manage the data synchronization between Publisher and Subscribers.

Merge replication is also an unending nightmare of pain, but you do what you gotta do.

Comments closed

Kafka 2.3 and Kafka Connect Improvements

Published 2019-08-16 by Kevin Feasel

Robin Moffatt goes over improvements in Kafka Connect with the release of Apache Kafka 2.3:

A Kafka Connect cluster is made up of one or more worker processes, and the cluster distributes the work of connectors as tasks. When a connector or worker is added or removed, Kafka Connect will attempt to rebalance these tasks. Before version 2.3 of Kafka, the cluster stopped all tasks, recomputed where to run all tasks, and then started everything again. Each rebalance halted all ingest and egress work for usually short periods of time, but also sometimes for a not insignificant duration of time.
Now with KIP-415, Apache Kafka 2.3 instead uses incremental cooperative rebalancing, which rebalances only those tasks that need to be started, stopped, or moved. For more details, there are available resources that you can read, listen, and watch, or you can hear the lead engineer on the work, Konstantine Karantasis, talk about it in person at the upcoming Kafka Summit.

Looks like some nice improvements here.

Comments closed

Python versus R (Again)

Published 2019-08-16 by Kevin Feasel

Alex Woodie looks at whether Python is dominating R in the data science space:

There is some evidence that Python’s popularity is hurting R usage. According to the TIOBE Index, Python is currently the third most popular language in the world, behind perennial heavyweights Java and C. From August 2018 to August 2019, Python usage surged by more than 3% to achieve a 10% rating (TIOBE’s proprietary metric that primarily measures search activity), easily the biggest gain among the 20 most popular languages.
R, by contrast, has not fared well lately on the TIOBE Index, where it dropped from 8th place in January 2018 to become the 20th most popular language today, behind Perl, Swift, and Go. At its peak in January 2018, R had a popularity rating of about 2.6%. But today it’s down to 0.8%, according to the TIOBE index.

I’ll say that rumors of R’s demise are premature.

Comments closed

Installing Microsoft Master Data Services

Published 2019-08-16 by Kevin Feasel

Garry Bargsley shows how you can install Master Data Services on SQL Server:

MDS Installation pre-requisites:
The first step is to add the IIS feature to the server where MDS is going to be installed
Follow these steps for more information
https://docs.microsoft.com/en-us/sql/master-data-services/master-data-services-installation-and-configuration?view=sql-server-2017#InstallIIS

Read on for full instructions.

Comments closed

The Importance of Interaction in Power BI

Published 2019-08-16 by Kevin Feasel

Marc Lelijveld continues a series on storytelling with Power BI:

Many times, I see reports with loads of visuals on the pages. This results in both a really poor performance, as well as the end user has no clue what the key message is of this report. You can always ask yourself, is this visual necessary to show on this page? What does it add to this page? Is this really needed? If not, remove it! If the visual does add some value, is it needed on this page? Maybe it is only distracting the user of where the report is about.
A good approach can be to put certain visuals on a different page or hide them by default until the user interacts with the report. Within the interaction, you will have multiple options in Power BI to interact with your user.

There’s a lot more to it, so read on.

Comments closed

Automate VM Shutdown

Published 2019-08-16 by Kevin Feasel

Meagan Longoria has a script to shut off an Azure VM when a SQL Agent job finishes:

The runbook sets the Azure context to the appropriate subscription (especially important when you are a guest user in someone else’s tenant). Then it checks if the VM is started. If it is, it goes into a do-while loop. This task isn’t super time sensitive (it’s just to save money when the VM isn’t in use), so it’s waiting 60 seconds and then calling the child runbook to find out if my SQL Agent job is running. This makes sure that the child runbook is called at least once. If the result is that the job is not running, it stops the VM. If the job is running, the loop starts over, waiting 60 seconds before checking again. This loop is essentially polling the job status until it sees that the job is completed.

Click through for the script.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts