Press "Enter" to skip to content

Author: Kevin Feasel

NYC Open Data R Package

Antoine Soetewey announces a package:

I am pleased to announce the release of nycOpenData, an R package providing convenient, tidy access to dozens of datasets from the New York City Open Data platform.

The package is designed as part of an open-science and reproducible-research effort, with the goal of lowering the friction between public data and statistical analysis—especially for teaching, exploratory research, and applied civic work.

It is available on CRAN, so it should be easy to grab. H/T R-Bloggers.

Leave a Comment

SIDs and Distributed Availability Groups

Evan Corbett troubleshoots an issue:

After building a contained availability group in SQL Server, a customer was experiencing intermittent issues connecting to their primary database. Our investigation revealed that the SQL Authentication login being used had been created both within the context of the contained AG as well as directly on the primary node but had different SIDs in each location.  

This is a pretty common issue when using SQL authentication, and it always seems to bite at the least opportune times.

Leave a Comment

TO_CHAR() in Oracle vs Postgres

Deepak Mahto diagnoses a tricky difference in behavior:

You migrate a perfectly stable Oracle application to PostgreSQL.

  • The SQL runs
  • The tests pass
  • The syntax looks correct
  • Nothing crashes

And yet… the numbers or query calculations are wrong.

Not obviously wrong. Not broken. Just different.
Those are the worst bugs the ones that quietly ship to production. This is a story about one such bug, hiding behind familiar operators, clean-looking conversions, and false confidence.

Read on for the story.

Leave a Comment

Building an Ubuntu VM from Powershell Script

Vlad Drumea has a script:

I needed a fresh Ubuntu VM in VirtualBox this weekend so I figured I can take this opportunity to refresh my PowerShell based process.

Prerequisites

For this VM I’m using the following:

  • Oracle VirtualBox 7.2.6 r172322 running on a Windows host.
  • The installation media (ISO file) for Ubuntu 24.04.3 LTS.
  • 51GB of available space on the drive where the VM will live.
    Note that the default Ubuntu install is ~10GB, but I want the extra space to be able to install other stuff later on.

Click through for the script and explanation of the process. It’s not entirely automated, but Vlad does get to the Ubuntu installation point via Powershell and then takes it up from there.

Leave a Comment

Columnstore Storage Structures

Hugo Kornelis continues a series on storage structures:

In the first part of this series, I described the storage structure and access patterns for SQL Server’s traditional storage structure: on-disk rowstore indexes (heaps and B-trees).

Columnstore indexes were introduced in SQL Server 2012. In that version only nonclustered columnstore indexes were supported (so they stored a copy of the data in the included columns, with the actual table data still stored in an on-disk rowstore heap or clustered index. And they made the table read only! That restriction was lifted in SQL Server 2014, when clustered columnstore indexes were also added. SQL Server 2016 then added the option to create additional nonclustered (on-disk rowstore) indexes on a clustered columnstore. And, also since SQL Server 2016, we now have ordered columnstore indexes – in my opinion a somewhat misleading name.

Read on to see how SQL Server manages this data.

Leave a Comment

Local Admin and SQL Server

Rebecca Lewis follows Betteridge’s Law of Headlines:

“Does a DBA need local administrator membership to manage SQL Server?”

The answer is simple: Local admin group membership is not required. In fact, best practices dictate that Database Administrators (DBAs) and SQL service accounts should not have local administrator rights on the host server.  This change was introduced as far back as SQL Server 2008 with a ‘secure by design, secure by default, and secure in deployment’ strategy.

Read on for the caveat of what happens when you don’t provide local admin access to your DBAs.

Leave a Comment

Automatic Stats Updates and Plan Cache Invalidation

Brent Ozar threads the needle:

Normally, when SQL Server updates statistics on an object, it invalidates the cached plans that rely on that statistic as well. That’s why you’ll see recompiles happen after stats updates: SQL Server knows the stats have changed, so it’s a good time to build new execution plans based on the changes in the data.

However, updates to system-created stats don’t necessarily cause plan recompiles.

Read on for the rare situation in which invalidation doesn’t happen.

Also, Brent has me wondering if the lemon popcorn is more sweet/tart (like a lemon bar) or savory (like lemon pepper) and I may have to try both styles.

Leave a Comment

Management Plane and Data Plane Permissions

Rebecca Lewis continues a series on cloud data platform security:

My last post ended with a promise to explain the management plane vs. data plane split in practical terms. The short story? There are two separate permission systems — one for managing resources, one for accessing data — and they don’t talk to each other. The good news: you already understand this model. You just don’t recognize it yet.

Click through for the explanation and a simile involving SQL Server’s security model.

Leave a Comment

Testing the Performance of Direct Lake vs Import Mode for Semantic Models

Gilbert Quevauvilliers performs some testing:

In this blog post I am going to show you how I completed the automated testing and then the results where I am going to compare Direct Lake, Import and DirectQuery and which one appears to be the best.

As always, your testing may very or be different to my tests below.

I would highly recommend that you use the method I have used and apply this testing technique to your data to understand and validate which semantic model would be best for your capacity.

Click through for details on the tests, query durations, and how the three major modes of data loading into Microsoft Fabric semantic models (Import, Direct Lake, Direct Query) fare.

Leave a Comment

Defining Technical Debt

Louis Davdison takes a favorite phrase of many an IT person:

Ah, the term “technical debt.” The implication of it is that you have this wonderful idea, this glorious design, and for time/money reasons, you said “we can’t achieve this.” I am not sure there has ever been a project that didn’t have technical debt. It happens in software, it happens in the real world. You probably have technical debt in your house, and huge companies like Disney make these glorious plans that never quite get finished.

Click through for a link to Louis’s video. As for my own definition of technical debt, I wrote a blog post about it a while back. As of this moment, the only part I might debate myself on is whether “It was a good decision at the time, but times have changed” is really technical debt or if it’s something else. From an ontological perspective, it’s probably a different category of thing. But from the standpoint of a practitioner with a code base or infrastructure as it is, I don’t know that it matters all that much whether we call it “technical debt” or “the ever-changing sands of time ruining all that is great.” Though definitely pull out the latter in a meeting when trying to explain to a PM why you need 40 hours of dev time to rewrite some code.

Leave a Comment