Press "Enter" to skip to content

Author: Kevin Feasel

LakeBench Now Available

Miles Cole makes an announcement:

I’m excited to formally announce LakeBench, now in version v0.3, the first Python-based multi-modal benchmarking library that supports multiple data processing engines on multiple benchmarks. You can find it on GitHub and PyPi.

Traditional benchmarks like TPC-DS and TPC-H focus heavily on analytical queries, but they miss the reality of modern data engineering: building complex ELT pipelines. LakeBench bridges this gap by introducing novel benchmarks that measure not just query performance, but also data loading, transformation, incremental processing, and maintenance operations. The first of such benchmarks is called ELTBench and is initially available in light mode.

Click through to see how it works and grab a copy if you’re interested.

Leave a Comment

Extending Ola’s Backup Scripts for Scale

Aaron Bertrand looks at some extensions to a venerable solution:

Where I work, we have hundreds of mission-critical databases. We use Ola Hallengren’s SQL Server backup solution, but, out of necessity, have wrapped it with several enhancements to help us accomplish goals around retention, SQL Server backup frequency, and recovery point objectives. I touched on some of this last year when I talked about automating some of our restore testing, in It’s a recovery strategy, not a backup strategy. Today, we do a lot more than that, and we have to be more flexible about it.

Click through to see what Aaron’s environment looks like and some of the things they’ve done to make recovery work better.

Leave a Comment

The Limits of Transparent Data Encryption

Steve Stedman breaks down some myths around Transparent Data Encryption in SQL Server:

Transparent Data Encryption (TDE) in SQL Server is a powerful feature designed to protect data at rest by encrypting database files. While TDE is an excellent tool for securing data stored on disk, it’s often misunderstood, leading to assumptions about its capabilities. In this blog post, we’ll clarify what TDE does not do to help you better understand its scope and limitations, ensuring you implement the right security measures for your SQL Server environment.

Click through for the list.

Leave a Comment

Looking at PostgreSQL Internals with pg_visibility

Ian Parker introduces us to an extension:

When most people think about PostgreSQL internals, they picture tables, indexes, and perhaps the VACUUM process. Tucked away in the contrib extensions, however, is a tool that exposes what really sits on disk: the pg_visibility extension. By querying it, you see how PostgreSQL tracks visibility and freezing at the page level—information that directly affects vacuum efficiency and index-only scans.

Read on to see how this extension works.

Leave a Comment

Not Patching: the False Sense of Security

Mike Walsh explains why it’s important to keep up to date on patches:

At first, I was a bit shocked, but then I realized it actually makes sense and it matches a regret I hear far too often from folks who call us too late“If we don’t change too much, we’re safe.”

Sadly, nothing can be further from the truth. When a software vendor releases a fix for a vulnerability, it certainly could be caused by a recent patch or update – but more often than not that fix is for a longstanding issue that has been until recently unknown to the world.

Read on for Mike’s argument. This also applies to people running versions of SQL Server no longer receiving patches (hint: 2016 SP3 and earlier). Just because there isn’t a patch doesn’t mean there isn’t a vulnerability.

Leave a Comment

Decision Trees and Non-Tabular Data

Ivan Palomares Carrascosa explains that you can use more than standard structured data against decision trees:

Versatile, interpretable, and effective for a variety of use cases, decision trees have been among the most well-established machine learning techniques for decades, widely used for classification and regression tasks. Yet, they are still widely used — whether as standalone models or as components of more powerful ensemble methods like random forests and gradient boosting machines.

And there is one more attractive feature that pushes the boundaries of their versatility even further: they can accommodate data in diverse formats, beyond just fully structured, tabular data. This article examines this facet of decision trees from a balanced theoretical and practical approach.

Click through for an example.

Leave a Comment

Event Notification via LISTEN/NOTIFY in PostgreSQL Doesn’t Scale

Elliot Levin takes us through a performance issue:

We love Postgres and it lives at the heart of our service! But this extremely concurrent, write-heavy workload resulted in a stalled-out Postgres. This is the story of what happened, how we ended up discovering a bottleneck in the LISTEN/NOTIFY feature of Postgres (the event notifier that runs based on triggers when something changes in a row), and what we ended up doing about it.

Click through for details, as well as what the team there did to migrate away from this feature.

Leave a Comment

Replication Types and Modes in PostgreSQL

Semab Tariq lays out three types of replication available in PostgreSQL:

PostgreSQL also supports replication to keep standby servers in sync with the primary server using Write-Ahead Log (WAL) files. Every change made to the database is first recorded in these WAL files on the primary server. These logs are then continuously streamed to the standby server, which applies them to stay up to date. This method ensures that all standby servers stay in sync with the primary and are ready to be promoted in case the primary server fails.

In this blog, we will explore the different types and modes of replication available in PostgreSQL to help you understand which option best fits your business needs.

Click through for the list, as well as an overview of each replication type.

Leave a Comment

Capture Long-Running Queries via Extended Events

Tom Collins has another extended events session for us:

A SQL Server Extended Event to track SQL queries taking longer than 100 seconds to complete. Adjust accoring to your requriements.

There is also a query below to extract the column details from the xel file 

Click through for the code. This kind of extended events session is rather useful for performance tuning and finding issues before customers e-mail.

Leave a Comment