Press "Enter" to skip to content

Author: Kevin Feasel

Physical Backups in Postgres via pg_basebackup

Muhammad Ali takes a backup:

The security and integrity of your company’s data are crucial in today’s data-driven environment. You must have a reliable backup plan in place to protect your PostgreSQL databases against unplanned calamities. In this article, we’ll examine how to create physical backups using the PostgreSQL tool pg_basebackup. We’ll talk about client needs, business use cases, backup space complexity, disaster recovery, point-in-time recovery (PITR), and how to use PostgreSQL to put these strategies into practice.

Read on for the instructions. Of particular importance is that point-in-time recovery.

Comments closed

Trace Flag 460 and String Truncation

Chad Callihan enables a trace flag:

In SQL Server 2016 and 2017, trace flag 460 can be used to gather additional details about string truncation errors. You may be familiar with the “String or binary data would be truncated” error message. Have you been left wondering what would be truncated? This is where trace flag 460 comes in. When enabled, the error message will include details on where exactly the potential truncation is taking place.

Read on to see if this is something you might benefit from enabling.

Comments closed

Last Page Insert Contention in SQL Server

Eitan Blumin spots a wild Latch Convoy:

The “Last Page Insert Contention” in SQL Server, also known as “Latch Convoy Problem“, also known as “PageLatchEx Contention” is one of those extremely rare use cases that are very difficult to see in real-world scenarios.

Evidently, it was impactful enough that Microsoft implemented a solution for this problem back in SQL Server 2019 in the form of the new OPTIMIZE_FOR_SEQUENTIAL_KEY index option, which reportedly fixes it.

Click through to learn more about a scenario in which Eitan saw this in the wild. In fairness, I’m not sure I’d do any better at realizing that this was a last page insert contention problem.

Comments closed

A Primer on Databricks Unity Catalog

Beginner’s Hadoop gives us an overview:

The Databricks Unity Catalog is a feature provided by Databricks Unified Data Analytics Platform that allows you to organize and manage metadata about your data assets, such as tables, databases, and views. It provides a centralized metadata repository that enables users to discover, understand, and collaborate on data assets within a Databricks environment. The Unity Catalog integrates with various data sources and supports different metadata management capabilities.

Read on for an overview of what it does.

Comments closed

Listing Available Properties in Azure Data Factory

Andy Leonard builds a list:

Did you know Azure Data Factory (ADF) will actually list available properties? It will. One of the things I cover in my ADF training titled Master the Fundamentals of Azure Data Factory is this handy troubleshooting tip.

Read on to see how, though I’d personally like something which is a bit faster than waiting for the thing to execute and getting back what my choices are.

Comments closed

On-Demand Loading and Direct Lake in Power BI

Chris Webb gives us the beginnings of an origin story:

For any Power BI person, Direct Lake mode is the killer feature of Fabric. Import mode report performance (or near enough) direct on data from the lake, with none of the waiting around for data to refresh! It seems too good to be true. How can it be possible?

The full answer, going into detail about how data from Delta tables is transcoded to Power BI’s in-memory format, is too long for one blog post. But in part it is possible through something that existed before Fabric but which didn’t gain much attention: on-demand loading. 

Click through for another blog post on the topic and an idea of how these tie together.

Comments closed

An Overview of Data Modeling

Nikola Ilic provides an overview of data modeling:

In recent years, I’ve done dozens of training on various data platform topics, for all kinds of audiences. When teaching various data platform concepts and techniques, I find one of the concepts particularly intimidating for many business analysts, especially those who are just starting their journey. And, that is the concept of data modeling.

This is a good introduction and does a particularly good job of explaining why we have logical and physical data models. I have one medium-sized quibble with an otherwise-great article: 3rd Normal Form is nowhere near sufficient for a logical data model, and I’d make the strong case (in fact, I do make that case) that 5th Normal Form should be the standard and that 3NF is an anachronism which you should entirely replace with Boyce-Codd Normal Form.

Comments closed

Loading Data from On-Premises SQL Server into Microsoft Fabric

Reitse Eskens spends an hour or so:

In my previous blogs, I’ve written about Fabric and all the cool things it can do. Thing is, my load tests were based on files. Either CSV or Delta. But in reality, a lot of data comes from an on-premises database server. In reality, you might connect to a SQL 2008 instance or maybe even older. Truth be told, I haven’t got an instance in that version/edition around anymore. So I had to use SQL Server 2019, a version I’m encountering more often nowadays.

For this blog, it won’t make much sense to create a humongous database and try to get all the data in. Fabric will cope, the major issue (in my experience) is the internet connection between my local database and the Fabric environment. One thing I’m really curious about is if Fabric will have the Link capability that was introduced for Synapse Analytics and SQL Server 2022.

There’s no Link capability currently available, so Reitse does the next-best thing and uses Fabric pipelines.

Comments closed

Events in Apache Kafka

Lucia Cerchie explains the key metaphor in an event-driven system:

We can speak broadly, maybe even a little philosophically, about what events are. Events are “things that happen,” or sometimes, they are otherwise defined as representations of facts. All data is, in a way, a result of humans trying to grok events. At the same time, I honestly don’t find the definition helpful if we leave it at this level. Do we ever design apps around things that don’t happen? (Don’t think about that too hard.) 

Let’s get concrete: Events that might affect real-time data pipelines and applications, including things like Pinterest saves, USPS address changes, ship coordinate updates, and credit card transactions. 

Read on for more about events and how they drive the design of Kafka topics and the applications which use them.

Comments closed

Rolling Correlation in R

Steven Sanderson tries out a function:

In the world of data analysis, time-series data is a common sight. Whether it’s stock prices, weather patterns, or website traffic, understanding the relationship between variables over time is crucial. One valuable technique in this domain is calculating rolling correlation, which allows us to examine the evolving correlation between two variables as our data moves through time. In this blog post, we will delve into the rollapply function and its capabilities, exploring its applications through a series of practical examples. So, let’s get started!

Click through for an example of how it works.

Comments closed