Press "Enter" to skip to content

Curated SQL Posts

An Overview of Classification Algorithms

Matthew Mayo explains several algorithms:

Classification algorithms are at the heart of data science, helping us categorize and organize data into pre-defined classes. These algorithms are used in a wide array of applications, from spam detection and medical diagnosis to image recognition and customer profiling. It is for this reason that those new to data science must know about and understand these algorithms: they lay foundations for more advanced techniques and provide insight into how those data-driven decisions are made.

Let’s take a look at 5 essential classification algorithms, explained intuitively. We will include resources for each to learn more if interested.

Click through for five algorithms and a couple of paragraphs describing how the algorithm works. For a little bit of self-promotion on my end, I have a series on YouTube running right now on the topic of classification where I cover a variety of algorithms. As a spoiler, 4 of the 5 on Matthew’s list will have their own videos, and there are several other algorithms to boot.

Leave a Comment

Time Travel in the Microsoft Fabric Warehouse

Reza Rad hops in the Delorean:

Data changes throughout time, especially in the world of BI and data warehousing systems; the data gets updated through ETL processes frequently. This means that the data you see in the warehouse today might differ from yesterday and the day before, and so on. Some parts of this data can be retrieved on a timely basis. You can, for example, query the sales amount from the sales table where the date has been the 2nd of April. That would give you the sales amount for the 2nd of April, even if you are querying it on the 23rd of May.

However, what if some of the sales transactions on the 2nd of April got updated? The sales amount you see would likely be the updated amount, but not the original amount. It is sometimes useful to be able to see what was that original amount, or in other words, travel in time and see what that value was.

Click through for a combination video and article. The syntax isn’t quite the same as with temporal tables in SQL Server, though it’s close enough to follow along if that’s your relevant experience.

Leave a Comment

Contained Availability Groups in SQL Server 2022

Rich Benner talks about a new feature:

SQL Server 2022 introduced of a number of new features, and one of the more interesting of these is Contained Availability Groups. These are very close to normal Availability Groups, but they have some very important differences. In this post, I’ll compare the new features included and some of the pitfalls that we’ve come across with their implementation.

Read on to learn more about the functionality and several harsh limitations.

Leave a Comment

Building an ERD for Existing Databases

Josephine Bush creates a diagram:

There are several tools out there to make your life easier by creating an ERD for your existing db. Everything works pretty well when you have a small number of tables with FKs mapped, but when the number gets bigger, the diagram naturally gets a lot messier. Here are some of the ones I tried.

Click through to see the full list. I haven’t found any that are particularly good at the job, especially not in the free or relatively inexpensive tiers. My problem is that the tools tend to get goofy when you update an existing model based on database changes: all that time you spent reorganizing entity locations so you don’t have a spaghetti mess of lines criss-crossing all of your entities gets wasted the next time you perform an update, because the tools tend to shuffle things around once again.

Leave a Comment

Finding Free Space in SQL Server Database Files

Greg Robidoux checks for free space:

One of the functions of a DBA is to keep an eye on free space within the database and database files. The auto grow feature is great as a last resort, but proactively managing the database files is a better approach. Also, this information can be used to shrink data files as needed if there is a lot of free space in the files.

In this tip we take a look at a few ways to find the current free space within a database, so you can better manage your database files.

Click through for several techniques.

Leave a Comment

The Right Tool for the Job

Bob Pusateri talks tools:

I was in a meeting the other day where someone stated that “writing SQL isn’t real engineering.” I was taken aback by this, and initially didn’t even know how to respond other than to say that sometimes SQL is the best tool to solve a problem, and other times it isn’t. The blanket statement that one programming language is somehow better or more “real engineering” than another seemed incredibly narrow-minded. I’ve done some reflecting on this since, and wanted to share.

Everybody knows that Real Programmers Use FORTRAN.

This is an amazingly obtuse take—not Bob’s response, mind you, but the original statement that Bob is responding to. “Using Miter saws isn’t real carpentry” is the first response I had to it. Or maybe “Using culverts isn’t real civil engineering.” SQL is a domain-specific language for working with data, and it is the best domain-specific language for working with data. That’s why Feasel’s Law is a thing, after all.

Bob’s response is more measured and thoughtful than what I’ve put in here, so check that out.

Leave a Comment

The Joy of the Common Table Expression

Kevin Wilkie talks common table expressions:

Most of our coding these days has Common Table Expressions, also known as CTEs. If not, you’re either working on an older version of SQL Server or you haven’t been introduced to this piece of goodness.

CTEs can make reading SQL queries a lot easier if the logic is convoluted. For example, let’s use the following in a CTE.

I’ll admit that I probably over-use common table expressions, but I like them more than sub-queries—I find them easier to read, and if they’re going to perform as well (or poorly) as sub-queries, I’d might as well use the form that makes more intuitive sense to me.

Leave a Comment

Tracking Transaction Log Space in SQL Server

Greg Robidoux keeps track of available space:

Keeping track of your transaction log usage is key to understanding how your database transaction logs are being used as well as keeping track of how much space the transaction logs are using. Depending on the recovery model that is set within your database and also the backup jobs that you run, the size of your transaction log can vary quite a bit. So what is the best way to keep track of the transaction log usage?

Read on to see how. This is almost assuredly a part of any monitoring system but if you’re working to roll your own or find yourself in an environment without such a system, it’s a good thing to track.

Leave a Comment

Orchestration Controllers in Azure Data Factory

Martin Schoombee gets to the top of the pyramid:

Controllers are pipelines that initiate the execution of a single process or task in a specific order and with constraints. Whereas everything else in this framework is pretty automated, this part is entirely manual.

Why? Well, when I started thinking about the design of this framework I knew I needed something at the “highest level” that would execute an entire daily ETL process, or a modified ETL process that only loads specific data during the day. I wanted to maximize the flexibility of the framework, and that either meant adding another level to the metadata structure or creating this layer of pipelines that sit at the top. I opted for the second, because I did not feel it was worth the complexity of adding another layer into the metadata structure. That being said, it doesn’t mean it cannot or shouldn’t be done…it was a personal choice I made to keep things as simple as I could.

Read on to learn more about what the controller should look like and how the other pieces fit in.

Leave a Comment