2023-12-21 – Curated SQL

A Demonstration of Why Not to Z-Standardize Values for Logistic Regression

Published 2023-12-21 by Kevin Feasel

Sebastian Sauer takes us through a demo:

In this post, we’ll investigate the consequence of z-standardizing the predictor variables, and in addition the outcome variable in a simple logistic regression setting.

Do some coefficients change as a result of standardizing the values?

Click through for the example and what z-standardization does to the model.

Comments closed

Power BI, Event Streaming, and Notebooks in Microsoft Fabric

Published 2023-12-21 by Kevin Feasel

Tomaz Kastrun continues a series on Microsoft Fabric. Day 18 has us looking at Power BI:

We have created a Power BI report directly from the datalake and today we will check how to do same with dashboard and paginated reports.

Day 19 covers event streaming:

In Fabric, you can create streaming semantic model and when selecting you will get the usual sources:

Day 20 shows how you can work with notebooks in Microsoft Fabric:

Notebooks have been around for a long time and people, community, and professionals have proven the usability, practicality, versioning and reliability of notebooks. Not to mention the clarity and hygiene. But opinions are also divided.

The purpose of this post today is to check for a couple of functionalities that might not be that straightforward when it comes to notebooks.

Comments closed

Making REST API Calls against Microsoft Fabric

Published 2023-12-21 by Kevin Feasel

Sandeep Pawar digs into the REST API:

Accessing Fabric REST endpoints in Fabric notebooks was already easy but it became easier and straightforward with semantic-link version 0.4.0. You can use the FabricRestClient class from sempy to set up a REST client and call the APIs. Authentication is automatically managed for you.

Click through to see how it works, as well as some warnings or things to keep in mind along the way.

Comments closed

Validating Numbers in T-SQL

Published 2023-12-21 by Kevin Feasel

Andy Brownsword asks if this thing is a number:

Data validation is key when ingesting from external sources. As we can’t always be certain of data quality we inevitably find bad data which needs to be handled. Here I wanted to look at a couple of options for validating numeric data.

Here’s the scenario – we’ve got data which may have been received via a flat file or passed into our database, and it should be a numeric value. How can we weed out the valid from invalid data?

Read on for the wrong answer (at least, the wrong answer given our expectations as developers or data platform specialists), followed by a good answer.

Comments closed

Isolation Levels and Stored Procedures

Published 2023-12-21 by Kevin Feasel

Erik Darling goes into isolation:

I’ve talked about isolation levels a bit lately because I need you all to understand that no isolation level is perfect, and that most everyone is completely wrong about how they really work.

For a very high percentage of workloads, Read Committed Snapshot isolation is the best choice. Why?

Read on for that answer. I think Erik is right about people misunderstanding how the different isolation levels work, as well as the root cause of not having a great place to try it out. You can build out demos of how different transaction isolation levels will work but some of the nuanced operations can be hard for one person with a couple new query tabs open to emulate.

Comments closed

Differential Backups of Master

Published 2023-12-21 by Kevin Feasel

Kenneth Fisher abides by Betteridge’s Law of Headlines:

In one of the sessions I attended during the Pass Data Community Summit the speaker asked “If master is in the simple recovery model can I take a differential backup of it?”

Read on for the answer to this, as well as a demonstration in fact of said answer. Kenneth also adds in bonus answers for free regarding msdb, model, and tempdb.

Comments closed

The Updated Stacked Bar Chart in Power BI

Published 2023-12-21 by Kevin Feasel

Tom Martens reviews an updated visual:

Personally, the stacked bar chart holds a special place in my heart when it comes to data visualization. It’s the tool I find myself using most frequently, which is why I decided to share a template using Deneb that I’ve been utilizing for a considerable amount of time: https://www.minceddata.info/2023/11/12/the-better-rectangular-pie-chart/

With the December 2023 release of Power BI Desktop, I can almost create the Deneb visual, which is fantastic as it eliminates the need for an additional custom visual. It’s important to note that while I’m a huge fan of Deneb, I also serve as the Power BI/Fabric sherpa in a large organization, and for this, I always try to reduce overall system complexity.

Click through for a fairly complex example of the visual.

Comments closed

Metadata-Based Counting and Filtered Indexes

Published 2023-12-21 by Kevin Feasel

Aaron Bertrand counts more efficiently:

That’s great when you want to count the whole table without size-of-entire-table reads. It gets more complicated if you need to retrieve the count of rows that meet – or don’t meet – some criteria. Sometimes an index can help, but not always, depending on how complex the criteria might be.

For me, counting more efficiently typically means I take off my shoes.

One other note is, if you just need a guesstimate, or if the cardinality of that column you’re splitting by is fairly low, you could also look at the histogram, especially if there’s a statistic on the column (or columns) you’re interested in. It’s rare that I think to go that way, but it is one of the tools the optimizer itself uses, so it’s fair game.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Day: December 21, 2023