Curated SQL – Page 502 – A Fine Slice Of SQL Server

Calculating Weekdays with M

Published 2022-06-06 by Kevin Feasel

Kristyna Hughes calculates weekdays on the fly:

Knowing the days between events is a fairly common reporting request because a lot of reporting is created to track SLA’s (service level agreement) and other KPI’s (key performance indicators). While getting the days between two dates is fairly easy to achieve, they tend to follow up and ask how many week days there are between two timed events. For example, one company may have a SLA to ship an order within three week days of the order being placed or else a discount is applied to the order. In this case, I would highly recommend that the company have software that calculates these days in the background and stores the actual week days between order date and ship date in a database. Unfortunately, many companies create policies like this without considering future reporting needs and these values have to be calculated on the backend.

Click through for the code but be sure to read the note that this is all weekdays, including holidays.

Comments closed

Power Pages: Websites with Power BI Embedded Reports

Published 2022-06-06 by Kevin Feasel

Chris Webb gives us a primer on Power Pages:

So what? I’m not a web designer and I’m pretty sure most of you aren’t either, so why blog about it here? Most data and BI people don’t need to build web sites… do they?
Well I was playing around with it and noticed one important detail:

Read on to see what that one important detail is and how you can use it.

Comments closed

Creating Reproducible Examples with CI

Published 2022-06-03 by Kevin Feasel

Colin Gillespie and Jack Walton tackle a common training problem:

As the number of courses we offer increased, so did the maintenance burden of our associated training materials (lecture notes, slides, exercises, and more). To ease this burden, and to assist in ensuring that our training materials build consistently, we developed an R package called {jrNotes2}. Amongst other things, this package ensures that all courses:
– have identical “template files”: .gitlab-ci.yml, .gitignore, Makefiles, index.Rmd, …;
– have the same directory structure, and
– pass a set of quality-assurance checks.

This is smart but read on to see why it’s still a challenge. This is especially true in the R and Python worlds, where breaking changes seem to be so common.

Comments closed

A Transaction Log in Apache Flink

Published 2022-06-03 by Kevin Feasel

Roman Khachatryan and Yuan Mei deal with transaction log issues:

State backends don’t start any snapshotting work until the task receives at least one checkpoint barrier, increasing the effective checkpoint duration. This is suboptimal if the upload time is comparable to the checkpoint interval; instead, a snapshot could be uploaded continuously throughout the interval.
This work discusses the mechanism introduced in Flink 1.15 to address the above cases by continuously persisting state changes on non-volatile storage while performing materialization in the background. The basic idea is described in the following section, and then important implementation details are highlighted. Subsequent sections discuss benchmarking results, limitations, and future work.

Read on to see what they did.

Comments closed

Alternatives to the Dead Letter Queue in Apache Kafka

Published 2022-06-03 by Kevin Feasel

Kai Waehner can’t return to sender:

This article focuses on the data streaming platform Apache Kafka. The main reason for putting a message into a DLQ in Kafka is usually a bad message format or invalid/missing message content. For instance, an application error occurs if a value is expected to be an Integer, but the producer sends a String. In more dynamic environments, a “Topic does not exist” exception might be another error why the message cannot be delivered.
Therefore, as so often, don’t use the knowledge from your existing middleware experience. Message Queue middleware, such as JMS-compliant IBM MQ, TIBCO EMS, or RabbitMQ, works differently than a distributed commit log like Kafka. A DLQ in a message queue is used in message queuing systems for many other reasons that do not map one-to-one to Kafka. For instance, the message in an MQ system expires because of per-message TTL (time to live).
Hence, the main reason for putting messages into a DLQ in Kafka is a bad message format or invalid/missing message content.

Read on to learn the Kafka-based approach to dealing with bad messages rather than using a Dead Letter Queue.

Comments closed

Understanding Missing Index Impact

Published 2022-06-03 by Kevin Feasel

Erik Darling delves into the depths of missing indexes:

Breaking each of those down, the only one that has a concrete meaning is Uses, but that of course doesn’t mean that a query took a long time or is even terribly inefficient.
That leaves us with Average Query Cost, which is the sum of each operator’s estimated cost in the query plan, and Impact.
But where does Impact come from?

Read on to learn where, as well as why you shouldn’t blindly trust that number.

Comments closed

T-SQL Language Enhancements in SQL Server 2022

Published 2022-06-03 by Kevin Feasel

Chad Baldwin checks out what’s new:

I’ve been exicted to play around with some of the new features and language enhancements that are available in SQL Server 2022 so I’ve been keeping an eye on the Microsoft Docker repository for a new 2022 image. Well, they finally added it to Docker Hub! I immediately pulled the image and started playing with it.
I want to focus on the language enhancements as those are the easiest to demonstrate, and I feel that’s what you’ll be able to take advantage of the quickest after upgrading.

Read on for a dozen or so language enhancements. This isn’t as big a change as what 2012 brought but there is a lot of useful stuff in here, as well as more that has been publicly announced like APPROX_PERCENTILE_CONT() (and _DISC(), yeah, but bah humbug).

Comments closed

SQL Server 2022 Public Preview on Linux

Published 2022-06-03 by Kevin Feasel

Amit Khandelwal has notes on SQL Server 2022 on Linux:

In continuation of last week’s announcement of SQL Server 2022 public preview, we are pleased to announce availability of SQL Server 2022 on Linux/Containers for public preview. Here are the details for getting started with the SQL Server 2022 public preview packages on Linux/Containers.

As usual, the officially supported distributions are Red Hat Enterprise Linux and Ubuntu.

Comments closed

ML Algorithms a Poor Fit for Predictive Caches

Published 2022-06-02 by Kevin Feasel

Pete Warden describes an interesting phenomenon:

I’ve been working on a new research paper, and a friend gave me the feedback that he was confused by the statement “memory accesses can be accurately predicted at the compilation stage” for machine learning workloads, and that this made them a poor fit for conventional processor architectures with predictive caches. I realized that this was received wisdom among the ML engineers I know, but I wasn’t aware of any papers that discuss this point. I put out a request for help on Twitter, but while there were a lot of interesting resources in the answers, I still couldn’t find any papers that focused on what feels like an important property for machine learning systems. With that in mind, I wanted to at least describe the issue as best as I can in this blog post, so there’s a trail of breadcrumbs for anyone else interested in how system designs might need to change to accommodate ML.

Read on for the explanation. My reading here is that this is a downside to having general-purpose compute: you run the risk of sub-optimal performance in certain circumstances, like training models using certain types of ML algorithms.

Comments closed

Data Modeling with Spark–Breaking Data into Multiple Tables

Published 2022-06-02 by Kevin Feasel

Landon Robinson tokenizes data:

The result of joining the 2 DataFrames – pets and colorsdisplays the nickname, color and age of the pets. We went from a normalized dataset where common & recurring values weresubstituted for numeric representation s— to a slightly more denormalized dataset. Let’s keep going!

This is an interesting example of a useful technique but I strongly disagree with Landon about whether this is normalization. Translating a natural key to a surrogate key is not normalizing the data and translating a surrogate key to a natural key (which is what the example does) is not denormalizing the data. A really simplified explanation of the process is that normalization is ensuring that like things are grouped together, not that we build key-value lookup tables for everything. That’s why Landon’s “denormalized” example is just as normalized as the original: each of those attributes describes a unique thing about the pet identified by its (unique) nickname. This would be different if we included things like owner’s name (which could still be on that table), owner’s age, owner’s height, a list of visits to the vet for each pet, when the veterinarians received their licenses, etc.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Curated SQL Posts