Press "Enter" to skip to content

Day: March 19, 2024

Taking a Billion Taxi Rides with DuckDB

Mark Litwintschik tries out DuckDB:

DuckDB is an in-process database. Rather than relying on a server of its own, it’s used as a client. The client can work with data in memory, within DuckDB’s internal file format, database servers from other software developers and cloud storage services such as AWS S3.

This choice to not centralise DuckDB’s data within its own server, paired with being distributed as a single binary, makes installing and working with DuckDB much less complex than say, standing up a Hadoop Cluster.

The project isn’t aimed at very large datasets. Despite this, its ergonomics are enticing enough and it does so much to reduce engineering time that workarounds are worth considering. The rising popularity of analysis-ready, cloud-optimised Parquet files is removing the need for substantial hardware when dealing with datasets in the 100s of GBs or larger.

Read on to learn more about DuckDB, how it differs from SQLite, and a bit of nuttiness around how far you can push an in-memory database.

Comments closed

Restorable Dropped Databases Naming in Azure SQL DB

Tanayankar Chakraborty asks, what’s in a name?:

An issue was reported recently where the customer complained that in their cost analysis report of their Azure SQL DBs, the db name appears appended with a comma(,) and a number. While they agreed with the DB name in the report, they didn’t understand the number after the comma and its significance. This is how the cost analysis report looks like:

Click through for a redacted version of the report, showing an example of the database in question, as well as an explanation of what this number means.

Comments closed

Creating Dynamic Moving Averages with Visual Calcs and Numeric Parameters

Erik Svensen builds a dynamic moving average:

With the introduction of visual calculation in the February 2024 release of Power BI desktop (https://powerbi.microsoft.com/en-us/blog/visual-calculations-preview/) – this gives us some new possibilities to add calculations on the individual visual and some new functions gives us some exiciting options.

One example could be to use the MOVINGAVERAGE function (link) to and combine it with numeric range parameter to make it dynamic.

Click through for a video and a description of how to do it.

Comments closed

A Dive into Direct Lake

Nikola Ilic digs into Direct Lake:

The most common question I’m hearing these days from clients is – how can we refresh the Direct Lake semantic model? It’s a fair question. Since they have been relying on Import mode for years, and Direct Lake promises an “import mode-like performance”…So, there has to be a similar process in place to keep your data up to date, right?

Read on to learn the answer to this question and quite a few more.

Comments closed

Thoughts on Common Table Expressions

Erik Darling has opinions:

Much like joins and Venn diagrams, anyone who thinks they have some advanced hoodoo to teach you about common table expressions is a charlatan or a simpleton. They are one of the least advanced constructs in T-SQL, and are no better or worse than any other abstraction layer, with the minor exception that common table expressions can be used to build recursive queries.

As I read through the post, I kept wanting to disagree with Erik more than I do. My short form is, I aesthetically prefer common table expressions to subqueries. But that doesn’t make CTEs faster.

Comments closed

Support for Iterators in SQL Server 2022 Password Validation

Michael Howard has an update for us:

We all know that as security threats evolve, we must update our defenses to mitigate newer threats. Over the last few months, some customers have asked us to strengthen the way we secure passwords in SQL Server. The most often-cited reference by customers is to comply with NIST SP 800-63b.

Currently supported versions of SQL Server and Azure SQL DB use a SHA-512 hash with a 32-bit random and unique salt. It is statistically infeasible for an attacker to deduce the password knowing just the hash and the salt. It is considerably easier for an attacker to hunt for insecure storage of database connection strings that contain database credentials than it is to break the password verifier (also called a password authenticator) used by SQL Server and Azure SQL DB. But that’s a discussion for another day and is the main reason we highly recommend using Entra ID authentication rather than using uid/pwd-based connections because Entra ID authentication manages credentials and supports access policies.

Michael gives you an idea as well of why iterators are valuable and how you can tell if your SQL authenticated accounts are using this new setup.

I’m curious how much of an effect this will have on login time. Typically, using PBKDF (or PBKDF2) does increase login time by a couple hundred milliseconds—which is the entire point of the exercise, ensuring that an attacker cannot feasibly brute force passwords in reasonable time—so I would not expect the difference to be too noticeable. For a really busy system, however, I could see some small impact.

Comments closed