Kevin Feasel – Page 350

A Demonstration of Why Not to Z-Standardize Values for Logistic Regression

Published 2023-12-21 by Kevin Feasel

Sebastian Sauer takes us through a demo:

In this post, we’ll investigate the consequence of z-standardizing the predictor variables, and in addition the outcome variable in a simple logistic regression setting.

Do some coefficients change as a result of standardizing the values?

Click through for the example and what z-standardization does to the model.

Comments closed

Validating Numbers in T-SQL

Published 2023-12-21 by Kevin Feasel

Andy Brownsword asks if this thing is a number:

Data validation is key when ingesting from external sources. As we can’t always be certain of data quality we inevitably find bad data which needs to be handled. Here I wanted to look at a couple of options for validating numeric data.

Here’s the scenario – we’ve got data which may have been received via a flat file or passed into our database, and it should be a numeric value. How can we weed out the valid from invalid data?

Read on for the wrong answer (at least, the wrong answer given our expectations as developers or data platform specialists), followed by a good answer.

Comments closed

Making REST API Calls against Microsoft Fabric

Published 2023-12-21 by Kevin Feasel

Sandeep Pawar digs into the REST API:

Accessing Fabric REST endpoints in Fabric notebooks was already easy but it became easier and straightforward with semantic-link version 0.4.0. You can use the FabricRestClient class from sempy to set up a REST client and call the APIs. Authentication is automatically managed for you.

Click through to see how it works, as well as some warnings or things to keep in mind along the way.

Comments closed

Isolation Levels and Stored Procedures

Published 2023-12-21 by Kevin Feasel

Erik Darling goes into isolation:

I’ve talked about isolation levels a bit lately because I need you all to understand that no isolation level is perfect, and that most everyone is completely wrong about how they really work.

For a very high percentage of workloads, Read Committed Snapshot isolation is the best choice. Why?

Read on for that answer. I think Erik is right about people misunderstanding how the different isolation levels work, as well as the root cause of not having a great place to try it out. You can build out demos of how different transaction isolation levels will work but some of the nuanced operations can be hard for one person with a couple new query tabs open to emulate.

Comments closed

Differential Backups of Master

Published 2023-12-21 by Kevin Feasel

Kenneth Fisher abides by Betteridge’s Law of Headlines:

In one of the sessions I attended during the Pass Data Community Summit the speaker asked “If master is in the simple recovery model can I take a differential backup of it?”

Read on for the answer to this, as well as a demonstration in fact of said answer. Kenneth also adds in bonus answers for free regarding msdb, model, and tempdb.

Comments closed

Metadata-Based Counting and Filtered Indexes

Published 2023-12-21 by Kevin Feasel

Aaron Bertrand counts more efficiently:

That’s great when you want to count the whole table without size-of-entire-table reads. It gets more complicated if you need to retrieve the count of rows that meet – or don’t meet – some criteria. Sometimes an index can help, but not always, depending on how complex the criteria might be.

For me, counting more efficiently typically means I take off my shoes.

One other note is, if you just need a guesstimate, or if the cardinality of that column you’re splitting by is fairly low, you could also look at the histogram, especially if there’s a statistic on the column (or columns) you’re interested in. It’s rare that I think to go that way, but it is one of the tools the optimizer itself uses, so it’s fair game.

Comments closed

The Updated Stacked Bar Chart in Power BI

Published 2023-12-21 by Kevin Feasel

Tom Martens reviews an updated visual:

Personally, the stacked bar chart holds a special place in my heart when it comes to data visualization. It’s the tool I find myself using most frequently, which is why I decided to share a template using Deneb that I’ve been utilizing for a considerable amount of time: https://www.minceddata.info/2023/11/12/the-better-rectangular-pie-chart/

With the December 2023 release of Power BI Desktop, I can almost create the Deneb visual, which is fantastic as it eliminates the need for an additional custom visual. It’s important to note that while I’m a huge fan of Deneb, I also serve as the Power BI/Fabric sherpa in a large organization, and for this, I always try to reduce overall system complexity.

Click through for a fairly complex example of the visual.

Comments closed

Batch Scoring in the Azure ML Designer

Published 2023-12-20 by Kevin Feasel

I have a new video:

In this video, I answer a viewer question about how to perform batch processing from the Azure Machine Learning Designer.

This video wraps up my work on Azure ML for now. I’m going to kick off a brand new series starting next week.

Comments closed

An Introduction to Poisson Regression

Published 2023-12-20 by Kevin Feasel

Steven Sanderson talks about a discrete form of regression:

Hey data enthusiasts! Today, we’re diving into the fascinating world of count data and its trusty sidekick, Poisson regression. Buckle up, because we’re about to explore how this statistical powerhouse helps us understand the factors influencing, you guessed it, counts.

Scenario: Imagine you’re an education researcher, eager to understand how a student’s GPA might influence their job offer count after graduation. But hold on, job offers aren’t continuous – they’re discrete, ranging from 0 to a handful. That’s where Poisson regression comes in!

I have an unhealthy love for Poisson techniques, so I highly recommend checking this out.

Comments closed

Advent of Code Day 5

Published 2023-12-20 by Kevin Feasel

Kevin Wilkie continues the advent of code. Part 1 starts with data prep and ends with some of the analysis:

There was a lot going on with this one, so let’s go ahead and get started. First thing we want to do, as always, is to go ahead and load all of our data into SQL Server. With this one, I cheated a little and loaded it into multiple tables to start with – since there are 7 different groups of data that we have to play with.

Could I have loaded it into one table and pull from there into other tables? Absolutely! Do I really want to? Heck, no!

Part 2 continues cross-referencing over a larger set of data:

This time, the meaning of the seed numbers is slightly changing on us. Instead of it only being 20 seeds we have to cross-reference (at least that’s how many seeds my input lists out), we now have 10 seeds and a rather large range of numbers to work with after each of those 10.

Thankfully, we can still use our Tally table to create yet another table with all of the seeds listed in it – yes, all of them!

Read on for Kevin’s solution.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Author: Kevin Feasel