Press "Enter" to skip to content

Day: March 14, 2024

tidyAML 0.0.5 Now Available

Steven Sanderson has an announcement:

I’m thrilled to announce the latest release of tidyAML, version 0.0.5, now available for download on CRAN or GitHub!

In this release, we’ve introduced some fantastic new features and made minor fixes and improvements to enhance your experience with tidyAML.

Click through to see what’s new in this version.

Comments closed

Retrieving Spark Session Config Variables from Microsoft Fabric

Koen Verbeeck gets some settings:

I was trying some stuff out in a notebook on top of a Microsoft Fabric Lakehouse. I was wondering what some of the default values are of the configuration variables, and if there’s an easy way to retrieve them all. Luckily there is. In the code, I’m using Scala because it has a nice GetAll() function.

Click through for an example of how to use this. And bonus points for using Scala instead of Python here.

Comments closed

Postgres Data Extraction with LATERAL joins and More

Ryan Booz extracts some data:

In our data hungry world, knowing how to effectively load and transform data from various sources is a highly valued skill. Over the last couple of years, I’ve learned how useful many of the data manipulation functions in PostgreSQL can supercharge your data transformation and analysis process, using just PostgreSQL and SQL.

For the last couple of decades, “Extract Transform Load” (ETL) has been the primary method for manipulating and analyzing the results. In most cases, ETL relies on an external toolset to help acquire different forms of data, slicing and dicing it into a form suitable for relational databases, and then inserting the results into your database of choice. Once it’s in the destination table with a relational schema, querying and analyzing it is much easier.

I call out CROSS JOIN LATERAL (or any kind of lateral join) here because it’s the ANSI equivalent of T-SQL’s APPLY operator, and I’ve already pointed out once today that I’m a huge fan of APPLY.

Comments closed

Overloading Power BI in Microsoft Fabric

Reitse Eskens pushes the envelope:

In my previous blog on Fabric and loadtesting, I ended with not really knowing how PowerBI would respond to all these rows. After creating and presenting a session on this subject, it’s time to dig into this part of Fabric as well. There were questions and I made promises. So here goes! This blog will only show the F2 experience as that’s where things went off the road. And, as I’ve shown in the previous blog, the CU count doesn’t change between SKU’s, only the amount of SKU’s available changes.
This blog isn’t meant to scold Fabric or make it look silly, I’m the one who’s silly. The goal is to show some limitations, a way you can do some load testing and help you find your way in the available metrics.

Read on to see what Reitse has gotten into.

Comments closed

Using the APPLY Operator

Erik Darling gets an auto-link for talking about my favorite operator:

I end up converting a lot of derived joins, particularly those that use windowing functions, to use the apply syntax. Sometimes good indexes are in place to support that, other times they need to be created to avoid an Eager Index Spool.

One of the most common questions I get is when developers should consider using apply over other join syntax.

The short answer is that I start mentally picturing the apply syntax being useful when:

To learn when, you’re going to have to read the whole thing. And, if you want to learn even more about it, I have a talk on the topic that might be of interest.

Comments closed

Postgres and NUMA

Annie Ghazali follows up on a Chris Travers webinar:

Q1. At what point we need to focus on ensuring huge_pages in PostgreSQL?

There are a couple of factors here. The first is, that if you’re able to show that you have multiple NUMA domains, it will almost always be a win performance-wise. But it becomes critical at the point where you start seeing that the checkpointer is running at 100 percent CPU load, and none of your queries are running at 100 percent CPU load, especially if you don’t have a lot of IO weight. That’s a really good indication that you’ve hit a point where it’s now a heavy bottleneck, and that’s a point where it’s starting to become something where you’re going to see a very large win out of it. 

Read on to see this full answer, as well as answers to questions around why you might not want to disable NUMA support and what NUMA does to swap space recommendations.

Comments closed