2019-12-11 – Curated SQL

NFL Kicker Quality

Published 2019-12-11 by Kevin Feasel

Jacob Long has an outstanding pair of posts on evaluating kickers in the NFL. FIrst up is the analysis itself:

Justin Tucker is so great that, quite frankly, it doesn’t matter which metric you use. PAA, FG% – eFG%, or just plain old FG%, he’s unlike anyone else in the past 10 years. Given the well-documented trend of increasing kicker accuracy in the NFL, I think Tucker has a solid claim on being the greatest kicker of all time.
Even with fewer seasons than many of his competitors, his PAA are double all the others who kicked in the past 10 years. He had a slightly more difficult than average set of attempts but made a higher percentage of his attempts than anyone who has had more than 22 tries. Good luck trying to find any defect in Tucker’s record.

Jacob then covers the method in detail:

Pasteur and Cunningham-Rhoads — I’ll refer to them as PC-R for short — gathered more data than most predecessors, particularly in terms of auxiliary environmental info. They have wind, temperature, and presence/absence of precipitation. They show fairly convincingly that while modeling kick distance is the most important thing, these other factors are important as well. PC-R also find the cardinal direction of every NFL stadium (i.e., does it run north-south, east-west, etc.) and use this information along with wind direction data to assess the presence of cross-winds, which are perhaps the trickiest for kickers to deal with. They can’t know about headwinds/tailwinds because as far as they (and I) can tell, nobody bothers to record which end zone teams defend at the game’s coin toss, so we don’t know without looking at video which direction the kick is going. They ultimately combine the total wind and the cross wind, suggesting they have some meaningful measurement error that makes them not accurately capture all the cross-winds. Using their logistic regressions that factor for these several factors, they calculate an eFG% and use it and its derivatives to rank the kickers.

Those wind factors make certain stadiums like New Era Field (where Buffalo plays) tricky: it’s fun to see two flags right next to each other pointing in opposite directions, or the flags on the field goal posts pointing hard right, then switching to hard left, then switching back to hard right over the course of a field goal try. H/T R-Bloggers

Comments closed

Azure Databricks and Delta Lake

Published 2019-12-11 by Kevin Feasel

Brad Llewellyn starts a new series on Delta Lake in Azure Databricks:

Saving the data in Delta format is as simple as replacing the .format(“parquet”) function with .format(“delta”). However, we see a major difference when we look at the table creation. When creating a table using Delta, we don’t have to specify the schema, because the schema is already strongly defined when we save the data. We also see that Delta tables can be easily queried using the same SQL we’re used to. Next, let’s compare what the raw files look like by examining the blob storage container that we are storing them in.

There are some good demos in this post and it promises to be a nice series.

Comments closed

Using SQL Server as a REST API Back-End

Published 2019-12-11 by Kevin Feasel

Davide Mauri shows how you can use SQL Server to power an API, using Flask as an example:

I mentioned in my previous article that having native JSON support in Azure SQL it’s a game changer as it profoundly change the way a developer can interact with a relational database, bringing the simplicity and the flexibility needed in today’s Modern Applications.
As Python is becoming immensely popular, one of the most common tasks for a developer is to create REST API using Python. Thanks to JSON support, using Azure SQL as a backend database to support your API is as easy as writing to a text file, with the difference that behind the scenes you have all the peace of mind that your data will be safely stored and made available on request, at scale, with also the option to push as much compute to data as you want, so that you can leverage the powerful query and processing engine while keeping your code simple, elegant and agile, with a clear separation of concerns. All these things will help you immensely once you’ll start to evolve your project to keep it updated with today’s demanding and ever-changing world.

Those who remember the days of ASMX web services in SQL Server (thankfully removed after 2005) might cringe, but I’ve actually done something like this for a company, where all of the data lived in SQL Server and the transformation logic was pretty simple. If you have to monkey with the JSON afterward in your middle tier, then just bring back a data set, but in a scenario like Davide shows, moving the JSON creation to Python wouldn’t really gain you anything.

Comments closed

Orchestrating ADF Pipelines

Published 2019-12-11 by Kevin Feasel

Cathrine Wilhelmsen continues a series on Azure Data Factory:

The other way to build this solution is by creating an orchestration pipeline with two execute pipeline activities. This gives us a little more flexibility than having a single pipeline, because we can execute each pipeline separately if we want to.
Let’s start by creating a new pipeline and adding two execute pipeline activities to it. In the activity settings, select the pipelines to execute, and check wait on completion:

Read on for the demonstration.

Comments closed

TDE Encryption Scan Internals

Published 2019-12-11 by Kevin Feasel

On the Microsoft Tech Community blog, goramesh shares with us how the initial encryption process works for Transparent Data Encryption:

Now, once encryption is turned ON for a database, all the existing user data on the data files should be encrypted. To do this, SQL Server starts something called a TDE Encryption Scan. It is basically a scanner, which goes through each page of each data file to ensure its encrypted. When the scanner completes its scan across all the files, that’s when we say that the database is ‘encrypted’. How the TDE Encryption scan works is crucial because of the effects it can have on the user workload. Let me explain.

Read on for the explanation.

Comments closed

Conditional Formatting Line and Area Charts with Power BI

Published 2019-12-11 by Kevin Feasel

Soheil Bakkshi shows how we can conditionally format line and area charts with Power BI:

One of my customers asked me to show time series in line charts and area charts. But she want’s it to be conditionally formatted based on the average value over time. Let’s keep it simple, she wants to show “Sales by Year Month” in line chart, but, highlight the data points that are below “Average Sales per Year Month”. As you may know, we currently do not have the luxury of formatting line charts and area charts. But wait, this post is all about that. Let’s dig into it.
From the above scenario, you perhaps already guessed that we need to create a measure which defines the colour based on “Average Sales per Year Month” to be able to format the chart conditionally. If any data point is below the “Average Sales per Year Month” then we highlight it in Orange, if it is above the “Average Sales per Year Month” then we stick to the default colour.
Let’s do it.

This is definitely not straightforward, but once you see the process, it’s pretty neat.

Comments closed

Fun with the TOP Operator

Published 2019-12-11 by Kevin Feasel

Jared Poche takes a look at the TOP operator and learns a bit along the way:

Sort is a blocking operator. Don’t feel bad if you haven’t heard of the term; I’ve been working with SQL Server for 15 years, and I’m sure I never heard the term until the incomparable Grant Fritchley mentioned it while he was lecturing at my place of employment.
So sorts and several other types of operators (eager spools, remote query\scan\etc, hash match joins, and more) will block the normal flow and gather all their results before passing any rows on. The hash match join only blocks while building its hash table from the first input, before probing the second.

Read the whole thing. Jared is just getting started with blogging, too, so go pay his blog a visit.

Comments closed

Finding Missing SQL Server MSIs

Published 2019-12-11 by Kevin Feasel

Annette Allen had a service pack installation go south due to missing MSIs:

I was recently doing a service pack, I’d run it on the entire test estate and half of the Production estate, I’d used Pinal Dave’s really useful AG check list and been really overcautious, I’d finished integrity checks on all databases, I’d backed everything up and even had a snapshot of the server completed.
When I clicked on the service pack I got the error “missing MSI” sorry I don’t have the screen dump or the full error message because at the time of trying to fix it I did’t think to take a copy

Read on to see what the root cause was and how Annette was able to fix this error.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Day: December 11, 2019