Press "Enter" to skip to content

Curated SQL Posts

Plotting Decision Trees in R

Steven Sanderson builds a tree:

Decision trees are a powerful machine learning algorithm that can be used for both classification and regression tasks. They are easy to understand and interpret, and they can be used to build complex models without the need for feature engineering.

Once you have trained a decision tree model, you can use it to make predictions on new data. However, it can also be helpful to plot the decision tree to better understand how it works and to identify any potential problems.

In this blog post, we will show you how to plot decision trees in R using the rpart and rpart.plot packages. We will also provide an extensive example using the iris data set and explain the code blocks in simple to use terms.

Read on to see an example of how to do this.

Comments closed

Data Center Staffing Disasters

Steve Jones reads an after-action report:

There was a failure recently at an Azure data center in Australia when a utility power sag caused equipment to trip offline at one of the Azure data centers in Australia. You can read about it here, but essentially the headline is that there were only three people on site when the incident occurred, and that caused them to be unable to restart the equipment in time before an outage occurred.

Read on to learn more about why this failed and what Steve has seen in the wild.

Comments closed

Options for Running Jobs against Azure SQL DB

Anthony Norwood replaces on-prem SQL Agent jobs:

Both SQL Server on Azure VM and Azure SQL Managed Instance provide you with SQL Server Agent and therefore the capability to run scheduled tasks against your databases, so when we’re talking about being able to run jobs we’re only considering Azure SQL Database as needing guidance – some of the suggestions  in the following paragraphs can also apply to all these options of SQL Server, but perhaps not as necessary.

We’re going to provide you with four options for how you might be able to still run your favourite SQL Agent Jobs against an Azure SQL Database, each of which come with their own advantages and disadvantages – one not mentioned is Data Factory, sometimes referred to as SSIS in the cloud, and this is because we’re trying to focus on some options that may be more comfortable to people who have never built an SSIS package before.

Read on for the four options Anthony has for us.

Comments closed

MERGE is (Kinda) Okay

Hugo Kornelis performs a survey:

The MERGE statement compares source and target data, and then inserts into, updates, and deletes from the target table, all in a single statement. This statement was introduced in SQL Server 2008. I liked it, because it allows you to replace a set of multiple queries with just one single query. And while a statement with that many options necessarily has a more complex syntax, I still believe that, in most cases, a single MERGE statement is easier to read, write, and maintain, than a combination of at least an INSERT and an UPDATE, often a DELETE, and sometimes first a SELECT into a temporary table if the source is complex.

Click through for a review of a variety of problems people have had in the past. It surprised me a bit when I learned how few of these issues were still active problems caused by MERGE.

Comments closed

Tiered Storage in Apache Kafka

Matthew de Detrich explains the value behind tiered storage in Apache Kafka:

Tiered Storage is arguably one of the most sought-after features of Kafka 3.6, allowing Kafka’s core data to be stored in other locations, such as object storage, in addition to hard disks in a transparent manner, without any changes to Kafka’s producers or consumers. The Kafka brokers control whether the data is stored on local disks, fast but expensive and limited, or in alternative storage places, such as Amazon S3. When Tiered Storage is properly configured, it means you can have the best of both worlds: recent data is stored on local fast disks (as is currently), and older, less frequently accessed data can be stored elsewhere where it’s cheaper and space requirements are less of a concern (sometimes unlimited!) 

Read on to learn more about the official version of tiered storage, as well as a forward port of two prior implementation attempts to Kafka version 3.3.

Comments closed

Choosing the Correct Enterprise Edition for SQL on Linux

Amit Khandelwal disambiguates two options:

Recently, in one of the meetings with our CSS team (support) we realized that our customers might not be selecting the right Enterprise edition for their SQL Servers deployed on Linux. To learn more about different SQL Server editions please refer: Editions and supported features of SQL Server 2022 – SQL Server | Microsoft Learn.

Thus, through this blog we would like to provide guidance on how you can identify and update the SQL Server to use the right Enterprise edition which should show up in the error log as : “Enterprise Edition: Core-based Licensing (64-bit)” and not “Enterprise Edition (64-bit)” which is only needed in case you need to follow historical licensing agreements that limits to a maximum 20 cores per SQL Server instance.

Read on to learn how you can change the edition if you did indeed goof on the installation.

Comments closed

Automated Power BI Visual Testing with PBI Inspector

Chris Webb phones a friend:

This week, one of my colleagues at Microsoft, Nat Van Gulck, showed me a cool new open-source tool he’s been working on to make VisOps for Power BI much easier: PBI Inspector. What is VisOps? I’ll admit I didn’t really know either, so being lazy I asked Nat to write a few paragraphs describing the project and why it will be useful:

Read on for Nat’s description and an example of PBI Inspector in action.

Comments closed

Controlling Duplicates in T-SQL

Itzik Ben-Gan lays out some fundamentals:

When people start learning a new field, for example T-SQL, it’s tempting to spend very little time trying to understand the fundamentals of the field so that you can quickly get to the advanced parts. You might think that you already understand what certain ideas, concepts and terms mean, so you don’t necessarily see the value in dwelling on them. That’s often the case with newcomers to T-SQL, especially because soon after you start learning the language, you can already write queries that return results, giving you a false impression that it’s a simple or easy language. However, without a good understanding of the foundations and roots of the language, you’re bound to end up writing code that doesn’t mean what you think it means. To be able to write robust and correct T-SQL code, you really want to spend a lot of energy on making sure that you have an in-depth understanding of the fundamentals.

No matter your knowledge level, there’s a really good chance you’ll learn at least one new thing in this article.

Comments closed