Press "Enter" to skip to content

Author: Kevin Feasel

On-Premises Scale-Out Post-Big Data Clusters

Chris Adkin looks at alternatives to SQL Server 2019 Big Data Clusters:

This post assumes that for reasons relating to data sovereignty, fiduciary or regulatory reasons in general that the:

– analytics platform will be underpinned by something which is cloud and on premises infrastructure agnostic, Kubernetes in other words.

– focal points of the Data Lake processing element will be Python and open source tools

– SQL Server 2022 S3 object virtualisation is the preferred technology for querying the Data Lake via a T-SQL surface area

– S3 is the preferred technology for storing the data in our Data Lake.

Read on for the high-level solution and stay tuned for more detailed answers.

Comments closed

Zero Records but Lots of Space Used

Jeff Iannucci solves a riddle:

Anyhow, it’s worthwhile to occasionally review the tables in a database to see which ones are growing every day, using the most space.

But what if during a review you see the largest table looks like this?

That’s around 24 GB of sweet drive space allocated for 0 records. But…how?

Let me show you how.

Click through to see how. My initial thought was LOB craziness but Jeff’s example doesn’t even need that.

Comments closed

Choosing a Cost Threshold for Parallelism

Andrea Allred doesn’t want to do things by the numbers:

Cost Threshold for Parallelism (CTfP) is one of my favorite server level settings in SQL Server. I remember the first time I heard this setting mentioned by Grant Fritchey. I quickly hopped on my servers and found them all set at the default (5) and adjusted them to 50 for the non SSRS servers and 30 for the SSRS ones. That was many years ago, but I had kept those numbers in my head because I didn’t know a better way.

Read on for a better way.

Comments closed

Deleting an RDS Instance

Chad Callihan takes out the trash:

We’ve created an AWS RDS instance and logged into it successfully. One thing to remember when creating test instances is when to them when you’re finished. While a lot of test instances I’ve created have been free tier, it’s still good to clean up rather than leave instances lingering. Today, let’s clean up a test instance.

Click through for the step-by-step on how to do this.

Comments closed

All about Boxplots

Amy Esselman explains what a boxplot is:

The “box” part of a boxplot outlines the lower and upper quartiles. Inside the box is a line that indicates the median value. There are lines that extend outside the box—known as the whiskers—to depict the range of values in a given dataset. If there are outliers, then individual dots in line with the whiskers are plotted to denote the extreme values. 

Click through for a depiction of the plot as well as several alternative depictions which can include more information at the cost of added complexity.

Comments closed

Creating an SSIS Integration Runtime in Synapse

Andy Leonard shows one way to create an Azure * SQL Server Integration Services integration runtime for Azure Synapse Analytics:

On 17 Feb 2022, I first saw the Microsoft announcement of the public preview of Azure-SSIS integration runtimes in Azure Synapse Analytics. I blogged about the announcement in a post titled Azure-SSIS Integration Runtime now available in Azure Synapse Analytics.

I am excited to share one way for you to provision an Azure-SSIS IR in Synapse Analytics, following these steps. To start provisioning a shiny new Azure Synapse Analytics Azure-SSIS integration runtime, open Synapse Studio:

Read on for the step-by-step guide.

Comments closed

Not Optimizing for Ad Hoc Workloads

Erik Darling continues a thread:

A few weeks back, my friend Randolph (B|T) and I collaborated a bit on a blog post about Optimize For Ad Hoc Workloads.

Now that I’m a Microsoft Approved Blogger™️ and you all have to take me Very Seriously™️, I’d like to make a few of the points here in my own words, though I highly suggest reading Randolph’s post because it’s much better.

Randolph writes for a living. I’m just a consultant with some free time.

Click through for Erik’s thoughts. Before reading Randolph and Erik’s posts, my figuring on it was that it didn’t hurt and could help, so “on” was a good default. These posts lay out a good reason why the former isn’t true and the latter is less likely than it seems.

Comments closed

SQL Server Analytics Updates

The SQL Server team drops bad news on a Friday:

Today, we are announcing the retirement of PolyBase scale-out groups in Microsoft SQL Server. Scale-out group functionality will be removed from the product in SQL Server 2022. In-market SQL Server 2019, 2017, and 2016 will continue to support the functionality to the end of support for those products.

In addition to killing Big Data Clusters, they’re also killing the Java connector in PolyBase and scale-out groups. I have a blog post coming up today on the topic with my full set of thoughts. The short version is, “Mostly not bad, though losing scale-out groups sucks.”

Comments closed

Goodbye, Big Data Clusters

Mohammad Darab has a few thoughts:

I wanted to publish a quick note about something a little near and dear to me. As of today, Feb 25th, 2022, Microsoft made the official announcement that they are retiring SQL Server Big Data Clusters. You can read the full statement here.

It is a bit disappointing to see the lack of take-up here. Personally, I fought to get this set up in my environment but wasn’t able to convince management. I’m guessing there was a fair amount of that which led to a lack of take-up.

Comments closed