Press "Enter" to skip to content

Curated SQL Posts

Loading Multiple CSV Files in R

Stephen Royle has a set of challenges:

In a previous post, I described how to run a session to teach R to cell biologists. In this post we’ll look in a bit more detail at one of the steps: how to load data into R.

As a reminder, a typical analysis task in cell biology follows these steps:

  • do the experiment(s)
  • collect the data – e.g. microscopy images
  • analyse and get a plain text (csv) output – e.g. using Fiji
  • load the data into R
  • crunch some numbers and plot

Click through for some explanation, as well as a few exercises (and solutions) for people trying to learn the language. H/T R-Bloggers.

Leave a Comment

Transparent Column Encryption in PostgreSQL

Vibhor Kumar releases an extension:

This extension provides transparent column-level encryption using custom PostgreSQL datatypes so developers can read and write encrypted columns without changing their SQL queries.

And perhaps the most human part of this project is this:

the idea for this project started back in 2016.

It stayed with me for years as one of those engineering ideas that never quite leaves your mind — the thought that PostgreSQL itself could enforce encryption at the column level.

The diagram does look a bit like what you’d see for SQL Server’s Transparent Data Encryption, though the internal mechanics are a good bit different.

Leave a Comment

Ontology Rules in Fabric Activator

Ansley Yeo creates some rules:

Ontology Rules let you define conditions and actions on top of your business entities, rather than on raw tables or telemetry streams.

These rules are evaluated using Fabric Activator, which monitors and triggers actions when conditions are met. The unique value is that the rule logic is expressed in the language of your business, using ontology entities and properties.

Ontologies are a thing I’m not quite sold on yet, whether in Microsoft Fabric or elsewhere. I get the concept of what they do and the concept that this is business logic that the business side could theoretically do. What I have trouble with is seeing the practical benefits. Any time I see “Your business users can…” I immediately add in my mind, “But they won’t.” It feels like people getting giddy over object-oriented development over the data.

That said, I am actively learning about the topic, so maybe I’ll change my mind as I learn more.

Leave a Comment

Disclosing Testing Machines

Louis Davidson lays out an argument:

Something that every writer needs to be careful of is doing too much benchmarking-type work. In many of the software licensing agreements you have signed, you promise not to do that. But at the same time, you can generally give out performance numbers if you aren’t making claims about particular software, especially compared to another.

So, if you come up with an algorithm to do something in a better way than you have seen, it is nice to show the software, give the reader access to the code you are showing the performance of, and include the computer you are running it on.

Louis is referring to the DeWitt Clause, a fairly common clause in commercial database products that came about because Oracle was angry that David DeWitt made them look bad by providing a fair comparison to other platforms.

Leave a Comment

Certificate Validation in Power BI Report Server

Deepthi Goguri notes a change:

When trying to connect to a SQL database within Power BI Desktop January 2026 met with certificate chain trust error when trying to connect to the SQL Database using database DNS. Below is the error:

Microsoft SQL: A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 – The certificate chain was issued by an authority that is not trusted.)”

The workaround for this is a bit weird, but Deepthi provides a solid explanation.

Leave a Comment

When Order Matters

The Rtask people tell a story:

You have inherited (or written) a data pipeline originally coded in SAS. It processes administrative billing records: matching line items against reference tables, applying time-varying coefficients, deduplicating based on business identifiers, computing running counters. Classic ETL work.

The migration to R goes well. You use {DBI} to open a DuckDB connection, load your source files as lazy tables via {arrow} or dplyr::tbl(), build the transformations with {dbplyr}, and collect the result at the very end. Your code is readable, your tests compare the R output to the SAS reference, and they pass (maybe using {datadiff}).

Then you run the pipeline again.

The numbers are different.

Give yourself 100 points if you answered “Because you need an ORDER BY clause” during the explanation. They also cover a few other places where DuckDB interactions in R can cause issues. Most of this is straightforward for data platform people, but can cause consternation for developers. H/T R-Bloggers.

Leave a Comment

Microsoft Fabric Mirroring and SQL Server 2025

Meagan Longoria takes a peek at mirroring in Microsoft Fabric:

Mirroring of SQL Server databases in Microsoft Fabric was first released in public preview in March 2024. Mirrored databases promise near-real-time replication without the need to manage and orchestrate pipelines, copy jobs, or notebooks. John Sterrett blogged about them last year here. But since that initial release, the mechanism under the hood has evolved significantly.

Read on to see how this behaves for versions of SQL Server prior to 2025, and how it changes in 2025.

Leave a Comment

Making Row-Level Security Faster

Brent Ozar speeds up some operations:

The official Azure SQL Dev’s Corner blog recently wrote about how to enable soft deletes in Azure SQL using row-level security, and it’s a nice, clean, short tutorial. I like posts like that because the feature is pretty cool and accomplishes a real business goal. It’s always tough deciding where to draw the line on how much to include in a blog post, so I forgive them for not including one vital caveat with this feature.

Click through for that caveat, as well as how you can mitigate its performance impact.

Leave a Comment

Testing Implicit Conversion and Performance in SQL Server

Louis Davidson runs some tests:

If you have ever done any performance tuning of queries in SQL Server, no doubt one of the first thing you have heard is that your search argument data types need to match the columns that you are querying. Not one thing in this blog is going to dispute that. Again, the BEST case is that if your column is an nvarchar, your search string matches that column datatype. But why is this? I will do my best to make this pretty clear, especially why it doesn’t always matter.

Read on as Louis lays out the explanation.

Leave a Comment