Press "Enter" to skip to content

Author: Kevin Feasel

Notes on Postgres Backups

Muhammad Ali hits us with it:

Backing up your PostgreSQL database is a critical task for ensuring the safety and availability of your data. In the event of a hardware failure, software error, or other disaster, having a recent backup of your database can mean the difference between a brief outage and a catastrophic data loss. In this blog post, we’ll cover best practices for backing up PostgreSQL database.

Click through for some notes on various backup utilities (pg_dump, pg_dumpall, pg_basebackup), when you might want to use each, and a few more topics.

Comments closed

Building Your First Spark SQL Application

Dustin Vannoy has a new video for us:

Get hands on with Spark SQL (no Python or Scala) to build your first data pipeline. In this video I walk you through how to read, transform, and write the NYC Taxi dataset with Spark SQL. This dataset can be found on Databricks, Azure Synapse, or downloaded from the web to wherever you run Apache Spark. Once you have watched and followed along with this tutorial, go find a free dataset and try to write your own application with Apache Spark.

Click through for the video and sample code.

Comments closed

Date Handling in Excel and R

Amieroh Abrahams continues a series comparing Excel and R:

Here we will explore the various ways to handle dates in Excel and R. Dates are a crucial part of data analysis and are used in various fields such as biology, healthcare, and social sciences. However, working with dates can be challenging, especially when dealing with large datasets or multiple formats.

In Excel, there are several functions available to handle dates, such as DATEYEARMONTH, and DAY. Excel also provides various formatting options to customise the display of dates. However, Excel has some limitations when it comes to complex date calculations, and it can be time-consuming to work with dates in large datasets.

In contrast, R has a robust set of tools for handling dates, including the {lubridate} package, which simplifies the manipulation of dates and times. Additionally, R allows for efficient handling of dates in large datasets, making it a powerful tool for time-series analysis. Whether you are working with dates in Excel or R, this blog will provide you with the basic tools and techniques to handle dates efficiently and accurately. So let’s get started!

Read on for the comparison.

Comments closed

Backup Encryption Performance

Matthew McGiffen runs some tests:

Unlike TDE, there is some extra CPU overhead when you take an encrypted backup as the data has to be encrypted before being written to disk – whereas with TDE the data is already encrypted. Backup times however are unlikely to be affected significantly as the bottleneck is usually going to be the time it takes to physically write the data to disk. The CPU processing should take a fraction of that time.

Matthew’s tests are on a ~9GB database, which is large enough to estimate differences without having us wait all day to compare.

Comments closed

Could Not Update the Metadata that Indicates Database Enabled for CDC

Jose Manuel Jurado Diaz troubleshoots an issue in Azure SQL DB:

Today, we got a error message while trying to enable cdc for a database using the sqladmin user. Our customer got the error message: Msg 22830, Level 16, State 1, Procedure sys.sp_cdc_enable_db_internal, Line 283 [Batch Start Line 0]
Could not update the metadata that indicates database XYZ is enabled for Change Data Capture. The failure occurred when executing the command ‘SetCDCTracked(Value = 1)’. The error returned was 33171: ‘Only active directory users can impersonate other active directory users.’. Use the action and error to determine the cause of the failure and resubmit the request.

Read on to understand what the problem is and how you can resolve it.

Comments closed

SQL Agent and Memory Consumption

Sean Gallardy performs some troubleshooting:

I was asked if I knew any reason why SQL Agent would be using “a bunch” of memory and more cpu than normal. You and I, reader, now have the same information to go on. What do you do? Think about for a minute or two if you want before reading on. I’m not saying the way I did it was the way to do it or the only way, but I gave an action plan for data capture and once data was in hand, was solved in a few minutes. There are all sorts of things that can cause this in a program, but Agent typically doesn’t use a whole lot, unless it’s executing many concurrent T-SQL jobs (which, it really shouldn’t be, get enterprise level scheduling) and even then, the memory should deallocate.

My first inclination turned out to be right: I figured it had to do with a job running. The specifics, that was something I wasn’t sure about, but Sean takes us through the troubleshooting process.

Comments closed

Analyzing Big-O Notation in Polyglot Notebooks

Matt Eland brings me back to college:

Polyglot Notebooks is a great way of running interactive code experiments mixed together with rich markdown documentation.

In this short article I want to introduce you to the #!time magic command and show you how you can easily measure the execution time of a block of code.

This can be helpful for understanding the rough performance characteristics of a block of code inside of your Polyglot Notebook.

In fact, we’ll use this to explore the programming concepts behind Big O notation and how code performance changes based on the number of items.

I like this for two reasons. First, because a visual indicator of Big-O notation is helpful for students learning about the topic. Second, because that’s not the only thing you can do with the #time magic.

Comments closed

Managing Database Privileges in Postgres

Ryan Booz limits database access:

We discussed how this can be done manually with a GRANT command each time an object is created, however, that is time consuming to manage and easy to miss a detail.

Instead, PostgreSQL provides a method for setting default privileges which are granted on behalf of the object owner as database objects are created. Using default privileges, a role can prepare the database ahead of time to ensure that consistent access privileges are applied while easing the management burden over time.

But how do you go about creating a set of roles and default privileges that will provide the right level of control and access? Let’s dig a little deeper.

RBAC on groups is definitely the way to go, and Ryan shows us how.

Comments closed

ALL and ALLSelected in DAX

Reza Rad compares and contrasts:

These two functions in DAX are often used instead of each other; ALL and ALLSELECTED. Their behavior can be similar in some contexts, but it can also be different in other contexts. In this article and video, I’ll explain the difference between these two functions and when to use each in DAX for Power BI, Analysis Services, or Power Pivot.

Reza has a video as well as a blog post to describe the differences.

Comments closed