Kevin Feasel – Page 304

Validating Numbers in T-SQL

Published 2023-12-21 by Kevin Feasel

Andy Brownsword asks if this thing is a number:

Data validation is key when ingesting from external sources. As we can’t always be certain of data quality we inevitably find bad data which needs to be handled. Here I wanted to look at a couple of options for validating numeric data.

Here’s the scenario – we’ve got data which may have been received via a flat file or passed into our database, and it should be a numeric value. How can we weed out the valid from invalid data?

Read on for the wrong answer (at least, the wrong answer given our expectations as developers or data platform specialists), followed by a good answer.

Comments closed

Differential Backups of Master

Published 2023-12-21 by Kevin Feasel

Kenneth Fisher abides by Betteridge’s Law of Headlines:

In one of the sessions I attended during the Pass Data Community Summit the speaker asked “If master is in the simple recovery model can I take a differential backup of it?”

Read on for the answer to this, as well as a demonstration in fact of said answer. Kenneth also adds in bonus answers for free regarding msdb, model, and tempdb.

Comments closed

Isolation Levels and Stored Procedures

Published 2023-12-21 by Kevin Feasel

Erik Darling goes into isolation:

I’ve talked about isolation levels a bit lately because I need you all to understand that no isolation level is perfect, and that most everyone is completely wrong about how they really work.

For a very high percentage of workloads, Read Committed Snapshot isolation is the best choice. Why?

Read on for that answer. I think Erik is right about people misunderstanding how the different isolation levels work, as well as the root cause of not having a great place to try it out. You can build out demos of how different transaction isolation levels will work but some of the nuanced operations can be hard for one person with a couple new query tabs open to emulate.

Comments closed

The Updated Stacked Bar Chart in Power BI

Published 2023-12-21 by Kevin Feasel

Tom Martens reviews an updated visual:

Personally, the stacked bar chart holds a special place in my heart when it comes to data visualization. It’s the tool I find myself using most frequently, which is why I decided to share a template using Deneb that I’ve been utilizing for a considerable amount of time: https://www.minceddata.info/2023/11/12/the-better-rectangular-pie-chart/

With the December 2023 release of Power BI Desktop, I can almost create the Deneb visual, which is fantastic as it eliminates the need for an additional custom visual. It’s important to note that while I’m a huge fan of Deneb, I also serve as the Power BI/Fabric sherpa in a large organization, and for this, I always try to reduce overall system complexity.

Click through for a fairly complex example of the visual.

Comments closed

Metadata-Based Counting and Filtered Indexes

Published 2023-12-21 by Kevin Feasel

Aaron Bertrand counts more efficiently:

That’s great when you want to count the whole table without size-of-entire-table reads. It gets more complicated if you need to retrieve the count of rows that meet – or don’t meet – some criteria. Sometimes an index can help, but not always, depending on how complex the criteria might be.

For me, counting more efficiently typically means I take off my shoes.

One other note is, if you just need a guesstimate, or if the cardinality of that column you’re splitting by is fairly low, you could also look at the histogram, especially if there’s a statistic on the column (or columns) you’re interested in. It’s rare that I think to go that way, but it is one of the tools the optimizer itself uses, so it’s fair game.

Comments closed

An Introduction to Poisson Regression

Published 2023-12-20 by Kevin Feasel

Steven Sanderson talks about a discrete form of regression:

Hey data enthusiasts! Today, we’re diving into the fascinating world of count data and its trusty sidekick, Poisson regression. Buckle up, because we’re about to explore how this statistical powerhouse helps us understand the factors influencing, you guessed it, counts.

Scenario: Imagine you’re an education researcher, eager to understand how a student’s GPA might influence their job offer count after graduation. But hold on, job offers aren’t continuous – they’re discrete, ranging from 0 to a handful. That’s where Poisson regression comes in!

I have an unhealthy love for Poisson techniques, so I highly recommend checking this out.

Comments closed

Batch Scoring in the Azure ML Designer

Published 2023-12-20 by Kevin Feasel

I have a new video:

In this video, I answer a viewer question about how to perform batch processing from the Azure Machine Learning Designer.

This video wraps up my work on Azure ML for now. I’m going to kick off a brand new series starting next week.

Comments closed

Database Compatibility Levels and SQL Server Versions

Published 2023-12-20 by Kevin Feasel

Kendra Little demands change:

According to Microsoft’s documentation, “Database compatibility level … allow[s] the SQL Server Database Engine to be upgraded while keeping the same functional status for connecting applications by maintaining the same pre-upgrade database compatibility level.”

But these days, the “functional status” of a database at a given compatibility level differs depending on whether you’re using SQL Server, Azure SQL Managed Instance, or Azure SQL Database – and in the hosted versions it may change anytime without notice. Surprise, your database behaves differently now!

The whole concept is breaking down, and this is bad news for users of both managed services and the boxed product.

Read on to learn more about the issue.

Comments closed

Advent of Code Day 5

Published 2023-12-20 by Kevin Feasel

Kevin Wilkie continues the advent of code. Part 1 starts with data prep and ends with some of the analysis:

There was a lot going on with this one, so let’s go ahead and get started. First thing we want to do, as always, is to go ahead and load all of our data into SQL Server. With this one, I cheated a little and loaded it into multiple tables to start with – since there are 7 different groups of data that we have to play with.

Could I have loaded it into one table and pull from there into other tables? Absolutely! Do I really want to? Heck, no!

Part 2 continues cross-referencing over a larger set of data:

This time, the meaning of the seed numbers is slightly changing on us. Instead of it only being 20 seeds we have to cross-reference (at least that’s how many seeds my input lists out), we now have 10 seeds and a rather large range of numbers to work with after each of those 10.

Thankfully, we can still use our Tally table to create yet another table with all of the seeds listed in it – yes, all of them!

Read on for Kevin’s solution.

Comments closed

Tokenizing SQL Queries

Published 2023-12-20 by Kevin Feasel

Phil Factor has a new script:

Imagine this: you have several directories full of SQL script files, and you need to know where a certain table is used. You’d rather like the context too, so you can check the whole SQL Expression and work out why it is running so slowly. Maybe, from that same daunting set of several directories, you need to search for a comment, either end of line, or block comment, perhaps structured. It could be that you just need to execute each query or statement in turn to check performance.

It’s not unusual to want to search just within strings. Although simple searches can be effective, you will at some point need a tool that is able to recognise and return a collection of strings representing the SQL code, divided up into the respective components of the SQL language.

For this article, we’ll use a PowerShell cmdlet, called Tokenize-SQLString, which is in my GitHub repository.

Read on for examples of how it works and the types of things you can do with it.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Author: Kevin Feasel