Press "Enter" to skip to content

Day: August 19, 2024

Analyzing the Game Wingspan

Dan Oehm builds a meta:

Wingspan is a great game even though I’ve only played it a few times. The mechanics are great, there are lots of bird varitions, and a bunch of different strategies to try. There are 170 birds, and I’ve probably only seen 30 of them. So, true to form, I’ve dabbled in a bit of data analysis to get a view of all the different types of cards in the game.

Open source wins again since the {wingspan} R package exists. It contains the details of each bird in the core, European, Oceania, and swift start sets. I’ll only be using the core set for this analysis since that’s the only one I’m semi familiar with.

Having not played the game before, Dan’s visuals drew me in. There’s also a regression analysis and discussion of the trade-off between in-game power versus victory points. H/T R-Bloggers.

Comments closed

A Primer on One-Hot Encoding

Vinod Chugani does a bit of data modeling:

Preparing categorical data correctly is a fundamental step in machine learning, particularly when using linear models. One Hot Encoding stands out as a key technique, enabling the transformation of categorical variables into a machine-understandable format. This post tells you why you cannot use a categorical variable directly and demonstrates the use One Hot Encoding in our search for identifying the most predictive categorical features for linear regression.

Read the whole thing.

Comments closed

T-SQL Tuesday 177 Roundup

Mala Mahadevan gives us the low-down on database code management:

I was privileged to host yet another T-SQL Tuesday, for the month of August, 2024. My topic was on Managing database code. I was worried about getting responses, given that most of the community is no longer on X/Twitter and my invite didn’t seem to get much attention. I was wrong. People seem to keep track of this by other means and the response was excellent. Summary as below.

Read on for 11 takes on the topic.

Comments closed

Table Partitioning in PostgreSQL

Lee Asher talks partitions:

Partitioning in PostgreSQL was for many years a feature almost entirely ignored, and even today is often thought of only as a means to increase performance on large tables. While it’s certainly useful for this, it’s also an invaluable tool for solving common data maintenance problems. In this article, we’ll examine the feature from this perspective.

Let’s consider a common situation: a table continually ingesting time series data ordered chronologically: purchase orders, forum posts, time-series data, whatever. Most systems contain more than one table like this. And because we live in a world of Big Data, these tables can contain hundreds of millions, even billions of rows. Business rules require the data to remain active for some fixed period of time, then be deleted, (or archived first, then deleted). Commonly this is a monthly process: every month copy data that has “aged out” to another location, then delete the rows from the live table.

It’s interesting that people consider partitioning a performance-improving technique in the PostgreSQL world, but in the SQL Server world, partitioning is primarily for ease of maintenance and we warn against thinking about it as a performance-improving technique.

Comments closed

Checking SQL Server Backup Status with sp_CheckBackup

Jeff Iannucci announces a new tool:

Whether you are backing up your SQL Server databases with Ola Hallengren’s scripts, Maintenance Plans, or some third-party software, your backups are your lifeline for recovering your data in case of a disaster.

So…are your databases recoverable? Do you know WHERE your most recent backups were written? Do you know WHEN they were written? Did you verify the backups for recoverability? Have you backed up any encryption certificates associated with your backups?

And if you think know the answers to these questions…are you sure? And can you confirm your Recovery Point Objective (RPO)?

Read on to learn more about the procedure and grab a copy from the GitHub repo.

Comments closed

SQL Server on Linux for Disaster Recovery

Mika Sutinen shares a use case:

SQL Server on Linux has been an option for quite a while already, but I have to admit that personally, I have previously not been a very excited about that idea. This is mostly due to some complexities and limitations of SQL Server on Linux. However, the recent CrowdStrike incident made it quite obvious, that there’s a great use case for it. And that use case is Disaster Recovery.

The idea of having some operating system variety makes sense, though read the whole thing, as some companies will use components that SQL Server doesn’t offer for Linux.

And if you want a deep dive into using SQL Server on Linux, I’ll do a bit of self-promotion and reference my video series on the topic.

Comments closed

Reclaiming Space after a DELETE Operation

Andy Yun checks disk usage:

In my current role at Pure Storage, I have the privilege of working with two amazingly smart, awesome SQL Server nerds; Andrew Pruski (b) and Anthony Nocentino (b). We often find ourselves facing interesting questions about SQL Server and storage, and today was no exception.

Andrew had a customer who wanted to know what happens on our FlashArray, from a space usage perspective, when they first delete a large volume of data in a database’s data file, then subsequently shrink the database’s data file.

Read on for that answer. This answer also applies to other storage solutions as well.

Comments closed

Building a SQL Server Failover Cluster in Google Cloud Platform

Dave Bermingham talks about availability:

Building a SQL Server Failover Cluster in Google Cloud Platform (GCP) is a powerful way to ensure your databases remain highly available, even in the face of unexpected failures. High Availability (HA) is crucial for any business-critical application. Downtime can mean lost revenue, decreased productivity, and even damage to your company’s reputation. However, creating HA clusters in the cloud, especially in GCP, presents unique challenges—most notably, the lack of shared storage, which has traditionally been a key component of SQL Server Failover Clustering.

Click through for an overview and a whitepaper on the topic.

Comments closed