Press "Enter" to skip to content

Day: May 22, 2023

Date Handling in Excel and R

Amieroh Abrahams continues a series comparing Excel and R:

Here we will explore the various ways to handle dates in Excel and R. Dates are a crucial part of data analysis and are used in various fields such as biology, healthcare, and social sciences. However, working with dates can be challenging, especially when dealing with large datasets or multiple formats.

In Excel, there are several functions available to handle dates, such as DATEYEARMONTH, and DAY. Excel also provides various formatting options to customise the display of dates. However, Excel has some limitations when it comes to complex date calculations, and it can be time-consuming to work with dates in large datasets.

In contrast, R has a robust set of tools for handling dates, including the {lubridate} package, which simplifies the manipulation of dates and times. Additionally, R allows for efficient handling of dates in large datasets, making it a powerful tool for time-series analysis. Whether you are working with dates in Excel or R, this blog will provide you with the basic tools and techniques to handle dates efficiently and accurately. So let’s get started!

Read on for the comparison.

Comments closed

Building Your First Spark SQL Application

Dustin Vannoy has a new video for us:

Get hands on with Spark SQL (no Python or Scala) to build your first data pipeline. In this video I walk you through how to read, transform, and write the NYC Taxi dataset with Spark SQL. This dataset can be found on Databricks, Azure Synapse, or downloaded from the web to wherever you run Apache Spark. Once you have watched and followed along with this tutorial, go find a free dataset and try to write your own application with Apache Spark.

Click through for the video and sample code.

Comments closed

Backup Encryption Performance

Matthew McGiffen runs some tests:

Unlike TDE, there is some extra CPU overhead when you take an encrypted backup as the data has to be encrypted before being written to disk – whereas with TDE the data is already encrypted. Backup times however are unlikely to be affected significantly as the bottleneck is usually going to be the time it takes to physically write the data to disk. The CPU processing should take a fraction of that time.

Matthew’s tests are on a ~9GB database, which is large enough to estimate differences without having us wait all day to compare.

Comments closed

SQL Agent and Memory Consumption

Sean Gallardy performs some troubleshooting:

I was asked if I knew any reason why SQL Agent would be using “a bunch” of memory and more cpu than normal. You and I, reader, now have the same information to go on. What do you do? Think about for a minute or two if you want before reading on. I’m not saying the way I did it was the way to do it or the only way, but I gave an action plan for data capture and once data was in hand, was solved in a few minutes. There are all sorts of things that can cause this in a program, but Agent typically doesn’t use a whole lot, unless it’s executing many concurrent T-SQL jobs (which, it really shouldn’t be, get enterprise level scheduling) and even then, the memory should deallocate.

My first inclination turned out to be right: I figured it had to do with a job running. The specifics, that was something I wasn’t sure about, but Sean takes us through the troubleshooting process.

Comments closed

Could Not Update the Metadata that Indicates Database Enabled for CDC

Jose Manuel Jurado Diaz troubleshoots an issue in Azure SQL DB:

Today, we got a error message while trying to enable cdc for a database using the sqladmin user. Our customer got the error message: Msg 22830, Level 16, State 1, Procedure sys.sp_cdc_enable_db_internal, Line 283 [Batch Start Line 0]
Could not update the metadata that indicates database XYZ is enabled for Change Data Capture. The failure occurred when executing the command ‘SetCDCTracked(Value = 1)’. The error returned was 33171: ‘Only active directory users can impersonate other active directory users.’. Use the action and error to determine the cause of the failure and resubmit the request.

Read on to understand what the problem is and how you can resolve it.

Comments closed