2024-01-09 – Curated SQL

Log Analytics over Azure Databricks Logs

Published 2024-01-09 by Kevin Feasel

This is an updated video and writeup on setting up and using Log Analytics with your Azure Databricks logs. Some of the content overlaps with what I shared in the past, but these instructions are valid for Databricks Runtimes 11.3+.

Read on for notes, demo data, and instructions on how to follow along.

Comments closed

Aggregating by Month and Year in R

Published 2024-01-09 by Kevin Feasel

Steven Sanderson groups by month and year:

Taming the beast of daily data can be daunting. While it captures every detail, sometimes you need a bird’s-eye view. Enter aggregation, your secret weapon for transforming daily data into monthly and yearly insights. In this post, we’ll dive into the world of R, where you’ll wield powerful tools like dplyr and lubridate to master this data wrangling art.

Click through for examples of summarizing daily data into monthly and annual data. One thing to keep in mind, however, is that the monthly aggregation in these examples is just month, so if you have July 2023 and July 2024 data, you’ll get a row back for July. It’s all about understanding what the grain of your data is, as well as your desired grain.

Comments closed

Goodbye Aurora Serverless v1

Published 2024-01-09 by Kevin Feasel

Alex Woodie breaks the news:

AWS has notified customers of its Amazon Aurora Serverless v1 service that it will cease supporting the offering at the end of 2024. Replacing v1 in the Aurora Serverless range, which supports Postgres and MySQL databases, will be v2, which offers some advantages but also one big disadvantage: It doesn’t scale all the way down to zero.

Click through for more information.

Comments closed

Generating Test Data with ChatGPT

Published 2024-01-09 by Kevin Feasel

Daniel Janik builds fake data:

Have you ever been tasked with creating test data for an application and then ran into performance problems once the application moves to production?

Many of us manage databases or applications that contain regulated data that can’t leave a production environment. This means that we need to “clean” the data if it’s going to be used in QA or development work and one common way to de-identify the data is to simply update columns like firstname and lastname with a simple format “firstname” + counter; however, this results in all the data being unique and sequential. Firstname1, firstname2, firstname3, …
This isn’t good for getting like for like results with a production database and can lead to questions we’ve heard before in the workplace like “Why didn’t we catch this in QA?”

This works reasonably well, though you’d want to be sure to seed in edge cases and the like. But if you just need to generate some realistic-ish data pretty quickly, this is one option that can work.

Comments closed

Visualizing Power BI Import Dependencies as a Graph

Published 2024-01-09 by Kevin Feasel

Chris Webb builds graphs, but not those types of graphs–the other type of graphs:

A few years ago a new pair of Profiler events was added for Power BI Import mode datasets (and indeed AAS models): the Job Graph events. I blogged about them here but they never got used by anyone because it was extremely difficult to extract useful data from them – you had to run a Profiler trace, save the trace file, run a Python script to generate a .dgml file, then open that file in Visual Studio – which was a shame because they contain a lot of really interesting, useful information. The good news is that with the release of Semantic Link in Fabric and the ability to run Profiler traces from a Fabric notebook it’s now much easier to access Job Graph data and in this blog post I’ll show you how.

Read on to see an example of it in action.

Comments closed

Optimized Locking in Azure SQL DB

Published 2024-01-09 by Kevin Feasel

Aaron Bertrand tries out a new feature:

In a sentence: Instead of locking individual rows and pages for the life of the transaction, a single lock is held at the transaction level, and row and lock pages are taken and released as needed.

This is made possible by previous investments in Accelerated Database Recovery and its persistent version store. A modification can evaluate the predicate against the latest committed version, bypassing the need for a lock until it is ready to update (this is called lock after qualification, or LAQ). There’s a lot more to it than that, and I’m not going to dive deep today, but the result is simple: long-running transactions will lead to fewer lock escalations and will do a lot less standing in the way of the rest of your workload. Locks held for shorter periods of time will naturally help reduce blocking, update conflicts, and deadlocks. And with fewer locks being held at any given time, this will help improve concurrency and reduce overall lock memory.

Read on to learn more about how it works and Aaron’s initial thoughts on the feature.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Day: January 9, 2024

Log Analytics over Azure Databricks Logs

Aggregating by Month and Year in R

Goodbye Aurora Serverless v1

Generating Test Data with ChatGPT

Visualizing Power BI Import Dependencies as a Graph

Optimized Locking in Azure SQL DB