Press "Enter" to skip to content

Curated SQL Posts

Managed Private Endpoints and Trusted Workspace Access for All

Wolfgang Strasser is very pleased with a recent announcement:

In times of data breaches and millions of customer entries breached, the security of your data platform is one of the things you need to consider upfront and – preferably in all your data solutions.

When Microsoft Fabric was announced the concepts of connecting to other parts of your already secured data platform in Azure was not possible. The options to (securely) connect Fabric to other parts of your Azure platform were not available initially.

Read on to learn more about Managed Private Endpoints and Trusted Workspace Access, the initial problem with them both, and how Microsoft has definitely improved things recently.

Comments closed

Filesystem Access for Database Restoration via dbatools

Andy Levy shares a lesson learned:

While performing an instance migration this spring, I happened upon something I didn’t expect in [dbatools](https://dbatools.io/). It should have been a simple backup/restore copy of the databases, with the backup files residing on a fileshare on the destination server after being copied there. I kept getting a warning that the backup files I was attempting to restore couldn’t be read, and the restores (via Restore-DbaDatabase) wouldn’t execute.

I checked permissions on the server over and over again. Both on the filesystem and for the share that I was attempting to read from. Even more curious, if I executed the restore database statements directly from within Management Studio, the databases restored without issue.

After doing quite a bit of digging, I managed to find the reason.

Read on to learn more about necessary permissions, as well as the issue Andy hit, as well as the solution.

Comments closed

Reading a Lakehouse Table from another Microsoft Fabric Workspace

Gilbert Quevauvilliers spans the gap:

I was doing some work recently for a customer and they had data stored in different Lakehouse’s which was in a different App Workspace.

I was pleasantly surprised that this can be quite easy to do.

In my example below I am going to show you how in my notebook I can read a table in a Lakehouse table when it is not attached to any Lakehouse.

It’s good that this is so easy to do, considering that current advice leans toward having multiple workspaces and not cramming everything into one.

Comments closed

Systematic Sampling in R

Steven Sanderson continues a series on sampling:

In this post, we will explore systematic sampling in R using base R functions. Systematic sampling is a technique where you select every (k^{th}) element from a list or dataset. This method is straightforward and useful when you want a representative sample without the complexity of more advanced sampling techniques.

Let’s dive into an example to understand how it works.

In very technical circles, this is also known as the “eenie-meenie-meiney-moe technique” and is very similar to the “duck-duck-goose” algorithm, though that has an additional stochastic input.

Comments closed

Fun(?) with CPU Affinity

Rod Edwards plays with fire:

Here we have the options, to set the processor affinity mask for all process, and also the I/O affinity mask. The default is to let SQL handle this automatically as above, as it will provide better performance for most workloads.

On a basic level, these settings restrict which CPUs can be used when restricting SQL server processing and/or SQL IO operations.

There are very specific use cases in which setting CPU affinity makes sense. Rod does a really good job of showing just why mucking with CPU affinity is not for the faint of heart.

Comments closed

Instrumenting Postgres

Grant Fritchey goes looking:

I’m still learning PostgreSQL and one of the things I’ve been looking at a lot lately is instrumentation. What do I mean? Well, if you’re in SQL Server, think, Dynamic Management Views (DMV), Extended Events, Query Store, <hack, spit> Trace <spit>. How do we know how long a query took to run? PostgreSQL can tell you, but, and this is one of those wild, cool, but, honestly, slightly frustrating things about PostgreSQL, not natively.

Read on for what Grant’s seen.

Comments closed

Columnstore Key Lookups are Bad News

Forrest McDaniel does not want to perform that key lookup:

I’ve read it repeatedly, columnstore key lookups are extra slow. The question of course, is why?

In my mental model, it makes sense. A normal key lookup adds about 3 reads.

While a columnstore lookup should add at least a read per column, since each column lives in its own segments.

But it turns out that it’s not a read per column, oh no. Columnstore indexes are amazing for large-scale aggregations and awful for individual lookups.

Comments closed

Temporal Table History Cleanup

Chad Baldwin reads the docs:

I recently built a system for collecting index usage statistics utilizing temporal tables, clustered columnstore indexes (CCIs) and a temporal table data retention policy. The basic idea behind the system is that it collects various stats about indexes and updates this stats table. However, because it’s a temporal table, all changes are logged to the underlying history table.

My history table is built using a clustered columnstore index and had a data retention policy set up for the temporal table, like so:

Read on to see the problem Chad ran into and why it turned out not to be an actual problem (except maybe of the PEBKAC variety). In fairness, I would have made the same mistake.

Comments closed

Cluster Sampling in R

Steven Sanderson shows us one sampling technique:

Cluster sampling is a useful technique when dealing with large datasets spread across different groups or clusters. It involves dividing the population into clusters, randomly selecting some clusters, and then sampling all or some members from these selected clusters. This method can save time and resources compared to simple random sampling.

In this post, we’ll walk through how to perform cluster sampling in R. We’ll use a sample dataset and break down the code step-by-step. By the end, you’ll have a clear understanding of how to implement cluster sampling in your projects.

Read on for the scenario and sample code.

Comments closed