Press "Enter" to skip to content

Day: March 27, 2024

Using dtplyr to Process Large Datasets

Dario Radecic takes us through an interesting library:

In a world where compute time is billed by the second, make every one of them count. There are zero valid reasons to utilize a quarter of your CPU and memory, but achieving complete resource utilization isn’t always a straightforward task. That is if you don’t know about R dtplyr.

One option is to use dplyr. It’s simple to use and has intuitive syntax. But it’s slow. The other option is to use data.table. It’s lightning-fast but has a steep learning curve and syntax that’s not too friendly to follow. The third – and your best option – is to combine the simplicity of dplyr with efficiency of data.table. And that’s where R dtplyr chimes in!

Today you’ll learn just how easy it is to switch from dplyr to dtplyr, and you’ll see hands-on the performance differences between the two. Let’s dig in!

I love the performance of the data.table library but strongly prefer the Tidyverse for the sake of convenience. I like that this bridges the gap, at least for dplyr style processing. H/T R-Bloggers.

Comments closed

Going with the Flow: GitHub Edition

I have a new video:

In this video, cover the GitHub flow. We talk a bit about branching strategies and how GitHub development ought to look in a multi-developer situation.

The GitHub flow is a minor variant from the classical Git flow, but one that works well with the ethos of development specifically in GitHub. It’s often overkill for a single-developer repo, but once you have a team working on a problem, this is a much more efficient approach.

Comments closed

EXPAND and COLLAPSE in Power BI Visual Calculations

Marco Russo and Alberto Ferrari build on a foundation:

In a previous article, we introduced VISUAL SHAPE, the table modifier that adds a hierarchical structure to a table, which is needed to implement visual calculations. In this article, we introduce the concept of visual context, the virtual table lattice, and the two main operators to navigate in the visual context: EXPAND and COLLAPSE.

Click through to learn more about these operators and how they fit into the rubric of visual calculations.

Comments closed

UNION and UNION ALL

Erik Darling hires the Pinkertons:

UNION and UNION ALL seem to get used with the same lack of discretion and testing as several other things in T-SQL: CTEs vs temp tables, temp tables vs. table variables, etc.

There are many times I’ve seen developers use UNION when result sets have no chance of being non-unique anyway, and many times I’ve seen them use UNION ALL when there would be a great benefit to discarding unnecessary duplicates.

Erik’s explanation goes about three steps beyond “UNION is bad so always use UNION ALL.” It’s a must-read for anybody who regularly writes T-SQL queries.

Comments closed

Correlated Subqueries in SQL

Joseph Yeates classifies subqueries:

I’ve recently been brushing up on my SQL skills, as I’ve used the language for a while but less so recently. Through this process, I’ve found that I’m comfortable with the topics of complex joins, Common Table Expressions (CTEs), and nested subqueries. However, it was my deep dive into subqueries where I found something new: correlated subqueries. In this post, we’ll explore the intricacies of subqueries, with a spotlight on the often overlooked (at least for me) correlated subqueries.

Click through to understand what correlated subqueries are, in contrast to other forms of subquery.

Comments closed

Parameterizing Dynamic SQL the Right Way

Andy Brownsword does things right, after a fashion:

When building dynamic SQL, safety is crucial. As we saw last week, we have the QUOTENAME function which can help when referencing object names.

Another aspect to consider is use of parameters. Integrating them incorrectly can leave us vulnerable to SQL injection attacks. Let’s take a look at how to handle them the wrong way, followed by the right way.

Why have the wrong way and then the right way? The answer is simple. It is a truth universally acknowledged, that a post with a Gallant must be in want of a Goofus.

Comments closed