Press "Enter" to skip to content

Month: January 2024

TidyDensity 1.3.0 Released

Steven Sanderson has an update to the TidyDensity package:

The latest release of the TidyDensity R package brings some major changes and improvements that open up new possibilities for statistical analysis and data visualization. Version 1.3.0 includes breaking changes, new features, and a host of minor fixes and improvements that enhance performance and usability. Let’s dive into what’s new!

Read on for that change list and how you can get a copy of the TidyDensity R package.

Comments closed

SSIS on Linux

I am not amused:

In this video, we bang our heads against the wall repeatedly with respect to SQL Server Integration Services. I spend a lot more time than I want to but we do get a mostly-functional product mostly working.

This was a frustrating video to make, but I think it was important to make it clear just what SSIS on Linux can and cannot do.

Comments closed

Switching between Active Relationships in Power BI Models

Meagan Longoria solves a head-scratcher:

A couple of weeks ago, I encountered a DAX question that I had not previously considered. They had a situation where there were two paths between two tables: on direct between a fact and dimension and another that went through a different dimension and a bridge table.

Click through for several examples of when this might come up, as well as how to solve the problem.

Comments closed

Updates to SQL Server Troubleshooting Stored Procedures

Erik Darling shares some updates:

I’ve been doing a lot of work on all of my free SQL Server troubleshooting stored procedures lately.

If you haven’t used them, or haven’t even heard of them, now’s a good time to talk about what they are, what they do, and some of the newer features and functionality.

Read on to see what’s new. If you haven’t used any of Erik’s procedures, I highly recommend them.

Comments closed

Thinking about Scale Up-Front

Andy Brownsword shares a warning:

A point of sale system being rolled out across hundreds of physical locations. Transaction data collected each night to be batch processed into a warehouse for usual types of analysis. Our integration preference was SSIS internally. A solution was deployed in preparation.

Rolling out of the new system started with a handful of locations which steadily increased as confidence grew. On the back of this the data hitting our solution was increasing too. With a trickle of data early on there were no issues as expected. A small volume of data from a small number of stores. The process flew. We left it doing it’s thing.

Read on to see the story take a darker turn and the importance of planning for scale.

Comments closed

2024 Data Professional Salary Survey Results

Brent Ozar counts the cash:

This is the 8th year now that we’ve been running our annual Data Professional Salary Survey, and I was really curious to see what the results would hold this year. How would inflation and layoffs impact the database world? Download the raw data here and slice & dice it to see what’s important to you. Here’s what I found.

Read on for the results and Brent’s analysis.

Comments closed

Aggregating by Month and Year in R

Steven Sanderson groups by month and year:

Taming the beast of daily data can be daunting. While it captures every detail, sometimes you need a bird’s-eye view. Enter aggregation, your secret weapon for transforming daily data into monthly and yearly insights. In this post, we’ll dive into the world of R, where you’ll wield powerful tools like dplyr and lubridate to master this data wrangling art.

Click through for examples of summarizing daily data into monthly and annual data. One thing to keep in mind, however, is that the monthly aggregation in these examples is just month, so if you have July 2023 and July 2024 data, you’ll get a row back for July. It’s all about understanding what the grain of your data is, as well as your desired grain.

Comments closed

Goodbye Aurora Serverless v1

Alex Woodie breaks the news:

AWS has notified customers of its Amazon Aurora Serverless v1 service that it will cease supporting the offering at the end of 2024. Replacing v1 in the Aurora Serverless range, which supports Postgres and MySQL databases, will be v2, which offers some advantages but also one big disadvantage: It doesn’t scale all the way down to zero.

Click through for more information.

Comments closed

Generating Test Data with ChatGPT

Daniel Janik builds fake data:

Have you ever been tasked with creating test data for an application and then ran into performance problems once the application moves to production?

Many of us manage databases or applications that contain regulated data that can’t leave a production environment. This means that we need to “clean” the data if it’s going to be used in QA or development work and one common way to de-identify the data is to simply update columns like firstname and lastname with a simple format “firstname” + counter; however, this results in all the data being unique and sequential. Firstname1, firstname2, firstname3, …
This isn’t good for getting like for like results with a production database and can lead to questions we’ve heard before in the workplace like “Why didn’t we catch this in QA?”

This works reasonably well, though you’d want to be sure to seed in edge cases and the like. But if you just need to generate some realistic-ish data pretty quickly, this is one option that can work.

Comments closed