Press "Enter" to skip to content

Curated SQL Posts

Defending Kubernetes

Joey D’Antoni defends the defensible:

I’ve seen a couple of posts (of course they were chock full of AI slop images) on LinkedIn in the last couple of weeks, talking about how challenging it is to implement Kubernetes. In the most recent post I saw, it stated that “it took 5 months for our CEO to implement Kubernetes for our app”, to which I would ask, why the hell is your CEO configuring your clusters. I designed, and implemented the Kubernetes infrastructure on my current project, and I’ve worked on for a while, so of course, I felt the need to share my opinions on the matter.

As far as Kubernetes on-premises goes, there are quite valid reasons to run it on-prem. Yeah, it’s easier to host in AKS or EKS, but that’s not always possible. But regardless of whether you’re hosting on-prem or in a cloud provider, Kubernetes requires solid knowledge across several areas, including networking, storage administration, systems administration, and CI/CD, not to mention the development skills needed for containerization.

I think Joey downplays the skill level required, but I don’t want to err in the opposite direction by overstating the challenge. But if you want anything beyond the bog-standard deployment of AKS/EKS, the “You must be this tall to ride the ride” sign is significantly higher than using other containerized solutions like Azure Application Services/Container Apps or Elastic Container Service.

Leave a Comment

Building Storage Tiers with Pure Storage in Powershell

Anthony Nocentino creates a medallion storage layout:

In modern IT environments, not all workloads require the same level of storage performance, protection, or cost. Some applications need high performance with aggressive data protection, while others are perfectly fine with lower performance in exchange for cost savings. This tiered approach to storage service delivery is fundamental to efficient infrastructure management.

In my previous post on Fusion, I took an application-centric approach, showing how to deploy SQL Servers using Fusion. Let’s switch gears now and learn how to define a storage service catalog. In this post, I’ll demonstrate how to build a complete storage service catalog using Pure Storage Fusion Presets, offering Bronze, Silver, and Gold tiers with optional replication. We’ll see how to leverage different array types (FlashArray //X and FlashArray //C) to optimize both performance and cost across your fleet.

Read on for a link to the code, as well as more information on how it works.

Leave a Comment

Join Strategies in Apache Spark

Ram Ghadiyaram looks at three join strategies in Apache Spark:

In this article, we are going to discuss three essential joins of Apache Spark.

The data frame or table join operation is most commonly used for data transformations in Apache Spark. With Apache Spark, a developer can use joins to merge two or more data frames according to specific (sortable) keys. Writing a join operation has a straightforward syntax, but occasionally the inner workings are obscured. Apache Spark internal API suggests several algorithms for joins and selects one. A basic join operation could become costly if you do not know what these core algorithms are or which one Spark uses.

This is not a comprehensive list, but it does cover three of the more common strategies when dealing with larger datasets.

Leave a Comment

What’s New in Microsoft Fabric Data Warehouse

Sowmya Sivaraman has an update:

Welcome to the August 2025 edition of What’s New in Fabric Warehouse. As summer winds down, despite August being a slower month, our team continued to deliver meaningful updates. We shipped several new features focused on enhancing data ingestion, improving the data management, and streamlining security. At the same time, much of our energy is going into preparing exciting announcements for FabCon Vienna — stay tuned for what’s coming next. Whether you’re optimizing workloads, building with SQL, or exploring new integrations, this roundup highlights improvements we think you’ll find valuable.

Click through for a list of changes.

Leave a Comment

Oracle’s LOGMINER and STREAMS Tools in the Modern Era

David Fitzjarrell looks at two classic tools:

Change is good, and occasionally Oracle changes utilities to make them easier to implement. Over the years a tool called LOGMINER has been available for various replication tasks, such as logical standby databases and an older product called STREAMS as well as updated tools such as Golden Gate. Let’s look into this topic again, with versions from 19c onward.

Click through for a bit of history on both tools, as well as where they’re at today.

Leave a Comment

Splitting to a Table via Regular Expression

Louis Davidson creates a table:

Continuing on with the REGEXP_ functions series, the next one I want to cover is the table valued function REGEXP_SPLIT_TO_TABLE. This function is definitely one of the ones you probably ought to know, especially if you are ever tasked to pull some data out of a data structure.

This function is a lot like the STRING_SPLIT function, and unlike things like the REGEXP_LIKE function, you can basically use the same main parameters as you used in STRING_SPLIT for simple cases, but from there the possibilities are a lot more endless because you can define almost any delimiters you want. It isn’t perfect, because of a few things, but we will discuss that more later on.

Read on to see how it works, including one major caveat.

Leave a Comment

Making XGBoost Run Faster

Ivan Palomares Carrascosa shares a few tips:

Extreme gradient boosting (XGBoost) is one of the most prominent machine learning techniques used not only for experimentation and analysis but also in deployed predictive solutions in industry. An XGBoost ensemble combines multiple models to address a predictive task like classification, regression, or forecasting. It trains a set of decision trees sequentially, gradually improving the quality of predictions by correcting the errors made by previous trees in the pipeline.

In a recent article, we explored the importance and ways to interpret predictions made by XGBoost models (note we use the term ‘model’ here for simplicity, even though XGBoost is an ensemble of models). This article takes another practical dive into XGBoost, this time by illustrating three strategies to speed up and improve its performance.

Read on for two tips to reduce operational load and one to offload it to faster hardware (when possible).

Leave a Comment

Regaining Access to sa on SQL Server

Garry Bargsley has a method:

Have you ever inherited a SQL Server instance or been called in to troubleshoot, only to discover that no one has SysAdmin access? It happens more often than you’d think. Clients reach out, needing urgent work done, but the SA password is long forgotten, and no other account has elevated permissions.

Unfortunately, SQL Server doesn’t offer a “reset on next login” option for SQL-authenticated accounts. So what can you do?

Read on for one technique. I also covered a similar method from Tim Radney, so you can see the ‘raw’ way (Tim) or the dbatools way (Garry).

Leave a Comment

Comparing Microsoft Fabric Consumption for Notebooks and Warehouse SQL Queries

Gilbert Quevauvilliers performs a comparison:

I saw that there was an update where it is now possible to use the Microsoft Fabric Warehouse to copy data directly from OneLake into the Warehouse.

This got me thinking, which would consume more capacity to get the data into the Warehouse table. As well as which one would be faster.

To do this I am going to be running a SQL query in the Warehouse.

Next, I will use a Notebook to copy the data from the OneLake files section to a Warehouse table.

Gilbert’s specific query involves loading data from a variety of CSV files into a lakehouse via notebook, and then into a warehouse table. Read on for the results.

Leave a Comment