Press "Enter" to skip to content

Curated SQL Posts

Building Storage Tiers with Pure Storage in Powershell

Anthony Nocentino creates a medallion storage layout:

In modern IT environments, not all workloads require the same level of storage performance, protection, or cost. Some applications need high performance with aggressive data protection, while others are perfectly fine with lower performance in exchange for cost savings. This tiered approach to storage service delivery is fundamental to efficient infrastructure management.

In my previous post on Fusion, I took an application-centric approach, showing how to deploy SQL Servers using Fusion. Let’s switch gears now and learn how to define a storage service catalog. In this post, I’ll demonstrate how to build a complete storage service catalog using Pure Storage Fusion Presets, offering Bronze, Silver, and Gold tiers with optional replication. We’ll see how to leverage different array types (FlashArray //X and FlashArray //C) to optimize both performance and cost across your fleet.

Read on for a link to the code, as well as more information on how it works.

Comments closed

Join Strategies in Apache Spark

Ram Ghadiyaram looks at three join strategies in Apache Spark:

In this article, we are going to discuss three essential joins of Apache Spark.

The data frame or table join operation is most commonly used for data transformations in Apache Spark. With Apache Spark, a developer can use joins to merge two or more data frames according to specific (sortable) keys. Writing a join operation has a straightforward syntax, but occasionally the inner workings are obscured. Apache Spark internal API suggests several algorithms for joins and selects one. A basic join operation could become costly if you do not know what these core algorithms are or which one Spark uses.

This is not a comprehensive list, but it does cover three of the more common strategies when dealing with larger datasets.

Comments closed

What’s New in Microsoft Fabric Data Warehouse

Sowmya Sivaraman has an update:

Welcome to the August 2025 edition of What’s New in Fabric Warehouse. As summer winds down, despite August being a slower month, our team continued to deliver meaningful updates. We shipped several new features focused on enhancing data ingestion, improving the data management, and streamlining security. At the same time, much of our energy is going into preparing exciting announcements for FabCon Vienna — stay tuned for what’s coming next. Whether you’re optimizing workloads, building with SQL, or exploring new integrations, this roundup highlights improvements we think you’ll find valuable.

Click through for a list of changes.

Comments closed

Oracle’s LOGMINER and STREAMS Tools in the Modern Era

David Fitzjarrell looks at two classic tools:

Change is good, and occasionally Oracle changes utilities to make them easier to implement. Over the years a tool called LOGMINER has been available for various replication tasks, such as logical standby databases and an older product called STREAMS as well as updated tools such as Golden Gate. Let’s look into this topic again, with versions from 19c onward.

Click through for a bit of history on both tools, as well as where they’re at today.

Comments closed

Splitting to a Table via Regular Expression

Louis Davidson creates a table:

Continuing on with the REGEXP_ functions series, the next one I want to cover is the table valued function REGEXP_SPLIT_TO_TABLE. This function is definitely one of the ones you probably ought to know, especially if you are ever tasked to pull some data out of a data structure.

This function is a lot like the STRING_SPLIT function, and unlike things like the REGEXP_LIKE function, you can basically use the same main parameters as you used in STRING_SPLIT for simple cases, but from there the possibilities are a lot more endless because you can define almost any delimiters you want. It isn’t perfect, because of a few things, but we will discuss that more later on.

Read on to see how it works, including one major caveat.

Comments closed

Making XGBoost Run Faster

Ivan Palomares Carrascosa shares a few tips:

Extreme gradient boosting (XGBoost) is one of the most prominent machine learning techniques used not only for experimentation and analysis but also in deployed predictive solutions in industry. An XGBoost ensemble combines multiple models to address a predictive task like classification, regression, or forecasting. It trains a set of decision trees sequentially, gradually improving the quality of predictions by correcting the errors made by previous trees in the pipeline.

In a recent article, we explored the importance and ways to interpret predictions made by XGBoost models (note we use the term ‘model’ here for simplicity, even though XGBoost is an ensemble of models). This article takes another practical dive into XGBoost, this time by illustrating three strategies to speed up and improve its performance.

Read on for two tips to reduce operational load and one to offload it to faster hardware (when possible).

Comments closed

Regaining Access to sa on SQL Server

Garry Bargsley has a method:

Have you ever inherited a SQL Server instance or been called in to troubleshoot, only to discover that no one has SysAdmin access? It happens more often than you’d think. Clients reach out, needing urgent work done, but the SA password is long forgotten, and no other account has elevated permissions.

Unfortunately, SQL Server doesn’t offer a “reset on next login” option for SQL-authenticated accounts. So what can you do?

Read on for one technique. I also covered a similar method from Tim Radney, so you can see the ‘raw’ way (Tim) or the dbatools way (Garry).

Comments closed

Comparing Microsoft Fabric Consumption for Notebooks and Warehouse SQL Queries

Gilbert Quevauvilliers performs a comparison:

I saw that there was an update where it is now possible to use the Microsoft Fabric Warehouse to copy data directly from OneLake into the Warehouse.

This got me thinking, which would consume more capacity to get the data into the Warehouse table. As well as which one would be faster.

To do this I am going to be running a SQL query in the Warehouse.

Next, I will use a Notebook to copy the data from the OneLake files section to a Warehouse table.

Gilbert’s specific query involves loading data from a variety of CSV files into a lakehouse via notebook, and then into a warehouse table. Read on for the results.

Comments closed

Dataflows Gen2 Tips and Tricks

Jon Vöge provides advice on the least beloved ELT process:

Dataflows Gen2 are frequently (and often rightfully so) bashed for their performance inefficiencies. Especially in comparison with other ingestion and transformation tools in Fabric (Notebooks, Pipelines, Copy Jobs, SPROCs).

The fact remains however, that in the hands of a self-service developer, they are an incredibly powerful tool – if you can spare the compute on your capacity.

In this article, I will highlight tips and tricks to make the most of working with Dataflow Gen2 in Fabric. The list is by no means exhaustive, but simply consists of a bunch of tips which I found useful in the past year, including new and overlooked features, as well as old best practices:

Read on for some things that are new to Dataflows Gen2, working with SharePoint, and making data loads not quite as slow.

Comments closed