June 2024 – Page 4 – Curated SQL

Parallel Index Builds in SQL Server

Published 2024-06-24 by Kevin Feasel

SQL Server doesn’t support parallel modifications to a b-tree index.

That might sound surprising. After all, you can certainly write to the same b-tree index from multiple sessions concurrently. For example, two sessions can happily write alternating odd and even numbers to the same integer b-tree index. So long as both sessions execute on different schedulers and take row locks, there will be no blocking and you’ll get true concurrency.

No, what I mean is: A single session can’t write to a b-tree index using more than one thread. No parallel plan modifications of a b-tree index, in other words. It’s a bit like the lack of parallel backward ordered scans. There’s no reason it couldn’t be implemented, but it hasn’t been so far.

Click through for a link to the full article. Or click the link I just added, your choice.

Comments closed

Power BI Theme Color Choices

Published 2024-06-24 by Kevin Feasel

Meagan Longoria explains some of what you get with themes in Power BI:

Power BI reports have a theme that specifies the default colors, fonts, and visual styles. In Power BI Desktop, you can choose to use a built-in theme, start with a built-in theme and customize it, or create your own theme.

Creating your own theme involves specifying formatting options in a JSON file and importing it into your report. This post will focus on the theme colors, but there are lots of other options that can be specified in a theme, including structural colors, fonts, and page and visual formatting options.

Read on to learn more about the three primary sets of colors you can specify.

Comments closed

SELECT FOR UPDATE in Postgres

Published 2024-06-24 by Kevin Feasel

Semab Tariq notes some syntax to assist with performing updates in a MVCC world:

In critical environments like banking, healthcare, and online retail, ensuring safe data modifications is crucial to prevent data corruption and maintain system integrity. PostgreSQL offers a robust solution for this with its row-level locking mechanism, which ensures that the data being modified is protected from concurrent changes. One key feature of PostgreSQL is the SELECT FOR UPDATE clause, which locks the selected rows against concurrent updates. In this blog, we will explore how to implement the SELECT FOR UPDATE clause in PostgreSQL and discuss its real-world use cases.

Read on to learn more about how it works.

Comments closed

Practical healthyR.ts Examples

Published 2024-06-21 by Kevin Feasel

Steven Sanderson provides some examples:

Today I am going to go over some quick yet practical examples of ways that you can use the healthyR.ts package. This package is designed to help you analyze time series data in a more efficient and effective manner.

Let’s just jump right into it!

Read on for a few common time series activities, such as testing for stationarity, extracting tends from noise, and performing lagged correlation.

Comments closed

Performing a Ping Sweep in Powershell

Published 2024-06-21 by Kevin Feasel

Vlad Drumea goes poking around:

This is a brief post containing a a piece of code that I use to do a ping sweep and resolve host names in PowerShell on a /24 subnet.

A /24 subnet refers to the last octet (segment of numbers) in an IP, and it ranges from 1 to 254.
This means that if you provide 100.100.100 to the $FirstThreeOctets variable, you’ll end up pinging every IP between 100.100.100.1 and 100.100.100.254.

This is where I say “Hey, go check out nmap.” I also say “Hey, don’t install nmap on your work machine unless you have explicit approval, so that you don’t get an unexpected visit from security.” Which is something I saw once and decided that wouldn’t be the life for me. But seriously, nmap is an extremely powerful network discovery tool.

Comments closed

Understanding Worktables in SQL Server

Published 2024-06-21 by Kevin Feasel

Steve Stedman takes a peek at tempdb:

A worktable in SQL Server is a temporary structure that the SQL Server Database Engine uses to process certain types of queries. These tables are not explicitly created by users but are generated by SQL Server internally to handle specific operations that cannot be managed directly within memory. Worktables are stored in the tempdb database and are crucial for facilitating complex query execution plans.

Read on for examples of when SQL Server will use worktables and some good ideas when you spot worktables in the wild. They’re not inherently bad, but there are some performance problems you could experience around them.

Comments closed

Don’t Enable TRUSTWORTHY on SQL Server

Published 2024-06-21 by Kevin Feasel

Jeff Iannucci shares good advice:

If you have ever used our free tool to check SQL Server security, you may have seen the check for the “TRUSTWORTHY database owned by sysadmin” show up as one of the highest of priority items, requiring action. When we started reviewing the security permissions and configurations for our clients’ instances, we didn’t expect to find it very often since TRUSTWORTHY database setting is off by default.

Unfortunately, this has been discovered with some frequency, and when combined with a few other common practices, it presents a tremendous vulnerability to escalate privileges for both authorized users and hackers.

Read on to learn more about this. And to supplement, I will once again link Solomon Rutzky’s outstanding guide on the topic.

Comments closed

Approximate Percentiles in SQL Server 2022

Published 2024-06-21 by Kevin Feasel

Chad Callihan tries out a big improvement:

How do you go about finding the median percentile of a data set? What if you need the top x percentile? Both the APPROX_PERCENTILE_CONT and APPROX_PERCENTILE_DISC functions can be used to solve these questions.

Let’s look at how we can use each and what makes them unique.

The approximate percentiles are guaranteed to be accurate to within a certain percentage, something like 3-5%, if I remember correctly–it’s higher than HyperLogLog’s ~2.5% but not so large as to be of low value. If you’ve ever tried to calculate a median or other percentile like the 75th or 95th percentile, you might have used PERCENTILE_CONT() in the past. At least until you get a few million rows in the table, at which point you stopped using it. My joke is, once you reach a certain table size, PERCENTILE_CONT() becomes so slow that it’s faster to install and configure SQL Server ML Services, learn R or Python, and send in the data to calculate a percentile than to wait for PERCENTILE_CONT() to complete.

The APPROX_PERCENTILE_* series is way, way faster. On reasonable-sized test cases of a couple million rows or so, my recollection is two orders of magnitude better performance, so long as you can deal with being off by a few percentage points. One of the best scenarios for something like this is calculating 95th percentile response times. Does it really matter that the actual response time was 187.5ms and SQL Server said 192.6 or 181.4? Probably not—you get a good idea of the magnitude, and that’s the important part here.

Comments closed

Model Deployment with vetiver

Published 2024-06-20 by Kevin Feasel

Colin Gillespie continues a series on vetiver:

In our previous blog, we provided an overview of MLOps and the {vetiver} package, creating and deploying a simple model locally. In this post, we’ll show you how to deploy a model to production using Posit Connect, SageMaker, and Docker.

Read on for those instructions and a demonstration.

Comments closed

An Overview of Spark in Microsoft Fabric

Published 2024-06-20 by Kevin Feasel

Reza Rad gives people a primer on Apache Spark:

Microsoft Fabric runs some workloads under the Spark engine, but what is it really? In this article, I’ll take you through the question of what Spark is, What benefits it has, how it is associated with Fabric, what configurations you have, and other things you need to know about it.

Reza talks a bit about history, interaction with languages, etc. As a quick addition to the languages list, you can use .NET languages like F# and C# with Spark, though it does involve setting up dotnet/spark and there are some open questions about its future. And I’m not even sure you could get it to work with Microsoft Fabric.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Month: June 2024