Curated SQL – Page 700 – A Fine Slice Of SQL Server

It is very common to have multiple reports for different audiences, while there is also one (group of) user(s) that requires to have an overview over all these different insights. The main challenge you will face, is having cross-report interactivity and find related insights.
Let’s take an example of three different roles, where we have a customer account manager, reseller manager and a regional manager. Of course, they should have the same single source of truth, but there is one thing you want to avoid as report creator! You do not want to create three different report for the three mentioned audiences. But as they have different roles and responsibilities, you do not want them to see each other’s data and keep it clean and simple! In this blog I will describe how you can setup cross-report drill through to jump from one report to another, while respecting applied filters and avoiding building three separate reports!

Click through for the process, as well as potential issues you may hit along the way.

Comments closed

Running a Docker Container as a WSL2 Distribution

Published 2021-04-14 by Kevin Feasel

Andrew Pruski has a wacky idea that just might work:

I’ve been playing around a bit with WSL2 and noticed that you can import TAR files into it to create your own custom distributions.
This means that we can export docker containers and run them as WSL distros!
So, let’s build a custom SQL Server 2019 docker image, run a container, and then import that container into WSL2…so that we have a custom distro running SQL Server 2019.

Read on to see how.

Comments closed

SCD2 Dimensions on Spark with Apache Hudi

Published 2021-04-13 by Kevin Feasel

David Greenshtein shows how we can build type-2 slowly changing dimensions using Apache Hudi:

Implementing SCD2 in a data lake without using an additional framework like Apache Hudi introduces the challenge of updating data stored on immutable Amazon S3 storage, and as a result requires the implementor to create multiple copies of intermediate results. This situation may lead to a significant maintenance effort and potential data loss or data inconsistency.
Apache Hudi is an open-source data management framework used to simplify incremental data processing and data pipeline development. Hudi enables you to manage data at the record level in Amazon S3 and helps to handle data privacy use cases requiring record-level updates and deletes. Hudi is supported by Amazon EMR starting from version 5.28 and is automatically installed when you choose Spark, Hive, or Presto when deploying your EMR cluster.

Click through for the process.

Comments closed

Star Schemas and Power BI Go Together

Published 2021-04-13 by Kevin Feasel

Marco Russo and Alberto Ferrari explain why star schemas make so much sense for Power BI:

Why should I have products, sales, date and customers as separate tables? Wouldn’t it be better to store everything in a single table named Sales that contains all the information? After all, every query I will ever run will always start from Sales. By storing everything in a single table, I avoid paying the price of relationships at query time, therefore my model will be faster.
There are multiple reasons why a single, large table is not better than a star schema. Here anyway, the focus is strictly on performance. Is it true that a single table is faster than a star schema? After all, we all know that joining two tables is an expensive operation. So it seems reasonable to think that removing the problem of joins ends up in the model being faster. Besides, with the advent of NOSQL and big data, there are so many so-called data lakes holding information within one single table… Isn’t it tempting to use those data sources without any transformation?

Read on to see why this is not the case.

Comments closed

Availability Groups and Undo of Redo

Published 2021-04-13 by Kevin Feasel

Sean Gallardy answers a hopefully-not-too-common question:

What is reverting? How did I get here? Is this real life? All great questions.

Technically, Sean only answers the first question, but does a really good job of it.

Comments closed

Running Jupyter Notebooks from Powershell

Published 2021-04-13 by Kevin Feasel

Rob Farley has a change of heart:

The concept is that if I have a notebook with a bunch of queries in it, I can easily call that using Invoke-SqlNotebook, and get the results of the queries to be stored in an easily-viewable file. But I can also just call Invoke-SqlCmd and get the results stored. Or I can create an RDL to create something that will email me based on a subscription. And I wasn’t sure I needed another method for running something.

Read on to see what changed Rob’s mind.

Comments closed

Reverse ETL in a Modern Data Warehouse

Published 2021-04-13 by Kevin Feasel

James Serra reverses the polarity:

“Reverse ETL” is the process of moving data from a modern data warehouse into third party systems to make the data operational. Traditionally data stored in a data warehouse is used for analytical workloads and business intelligence (i.e. identify long-term trends and influencing long-term strategy), but some companies are now recognizing that this data can be further utilized for operational analytics. Operational analytics helps with day-to-day decisions with the goal of improving the efficiency and effectiveness of an organization’s operations. In simpler terms, it’s putting a company’s data to work so everyone can make better and smarter decisions about the business. As examples, if your MDW ingested customer data which was then cleaned and mastered, that customer data can then by copied into multiple SaaS systems such as Salesforce to make sure there is a consistent view of the customer across all systems. Customer info can also be copied to a customer support system to provide better support to that customer by having more info about that person, or copied to a sales system to give the customer a better sales experience. As a last example, you can identify at-risk customers by surfacing customer usage data in a CRM.

Click through for more details, including information on a few startups working on reverse ETL projects.

Comments closed

Schema Modification Lock-Driven Deadlocks

Published 2021-04-13 by Kevin Feasel

Taryn Pratt diagnoses a deadlocking issue:

As expected, with the announcement that Stack Overflow for Teams is free for up to 50 users, we saw an incredible spike in sign-ups. Along with the spike in sign-ups, I started to receive a huge increase in alerts about deadlocks on the primary SQL Server for Teams. In the two weeks it took us to resolve the deadlocks, we hit at least 200 deadlocks.

Click through for the low-down on how they discovered and corrected the issue.

Comments closed

Predicting Insurance Prices with ML.NET

Published 2021-04-12 by Kevin Feasel

Chandra Kudumula shows off ML.NET:

There are three ways to begin with ML.NET
– API Model: You can start ML.NET through a Framework API and write code in C# or F#
– GUI Model: Use ML.NET Model builder in Visual Studio.
– CLI Model: For cross-platform development like Mac and Linux, use ML.NET CLI.
Let’s get started with API Model for predicting the insurance premium using ML.NET Framework.
I’m using Microsoft (MS) Visual Studio 2019 and creating a Console Application. Be sure that you have the latest version of VS and that .NET 5 SDK is installed.

Click through for the demo in Visual Studio using C#.

Comments closed

Simulating Prediction Intervals

Published 2021-04-12 by Kevin Feasel

Bryan Shalloway continues a series:

Part 1 of my series of posts on building prediction intervals used data held-out from model training to evaluate the characteristics of prediction intervals. In this post I will use hold-out data to estimate the width of the prediction intervals directly. Doing such can provide more reasonable and flexible intervals compared to analytic approaches.

Click through for the article, and be sure to check out part 1 if you haven’t already.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts