Press "Enter" to skip to content

Day: August 6, 2024

Systematic Sampling in R

Steven Sanderson continues a series on sampling:

In this post, we will explore systematic sampling in R using base R functions. Systematic sampling is a technique where you select every (k^{th}) element from a list or dataset. This method is straightforward and useful when you want a representative sample without the complexity of more advanced sampling techniques.

Let’s dive into an example to understand how it works.

In very technical circles, this is also known as the “eenie-meenie-meiney-moe technique” and is very similar to the “duck-duck-goose” algorithm, though that has an additional stochastic input.

Comments closed

Fun(?) with CPU Affinity

Rod Edwards plays with fire:

Here we have the options, to set the processor affinity mask for all process, and also the I/O affinity mask. The default is to let SQL handle this automatically as above, as it will provide better performance for most workloads.

On a basic level, these settings restrict which CPUs can be used when restricting SQL server processing and/or SQL IO operations.

There are very specific use cases in which setting CPU affinity makes sense. Rod does a really good job of showing just why mucking with CPU affinity is not for the faint of heart.

Comments closed

Instrumenting Postgres

Grant Fritchey goes looking:

I’m still learning PostgreSQL and one of the things I’ve been looking at a lot lately is instrumentation. What do I mean? Well, if you’re in SQL Server, think, Dynamic Management Views (DMV), Extended Events, Query Store, <hack, spit> Trace <spit>. How do we know how long a query took to run? PostgreSQL can tell you, but, and this is one of those wild, cool, but, honestly, slightly frustrating things about PostgreSQL, not natively.

Read on for what Grant’s seen.

Comments closed

Columnstore Key Lookups are Bad News

Forrest McDaniel does not want to perform that key lookup:

I’ve read it repeatedly, columnstore key lookups are extra slow. The question of course, is why?

In my mental model, it makes sense. A normal key lookup adds about 3 reads.

While a columnstore lookup should add at least a read per column, since each column lives in its own segments.

But it turns out that it’s not a read per column, oh no. Columnstore indexes are amazing for large-scale aggregations and awful for individual lookups.

Comments closed

Temporal Table History Cleanup

Chad Baldwin reads the docs:

I recently built a system for collecting index usage statistics utilizing temporal tables, clustered columnstore indexes (CCIs) and a temporal table data retention policy. The basic idea behind the system is that it collects various stats about indexes and updates this stats table. However, because it’s a temporal table, all changes are logged to the underlying history table.

My history table is built using a clustered columnstore index and had a data retention policy set up for the temporal table, like so:

Read on to see the problem Chad ran into and why it turned out not to be an actual problem (except maybe of the PEBKAC variety). In fairness, I would have made the same mistake.

Comments closed