Press "Enter" to skip to content

Author: Kevin Feasel

Reasons to Prefer apply() over For Loops in R

Hugo Gruson draws some comparisons:

The debate regarding the use of for loops versus the apply() function family (apply()lapply()vapply(), etc., along with their purrr counterparts: map()map2()map_lgl()map_chr(), etc.), has been a longstanding one in the R community.

While you may occasionally hear that for loops are slower, this notion has already been debunked in other posts. When utilized correctly, a for loop can achieve performance on par with apply() functions.

However, there are still lesser-known reasons to prefer apply() functions over for loops, which we will explore in this post.

Read on for an important caveat, and then several reasons to prefer apply() (or purrr’s counterparts). H/T R-Bloggers.

Comments closed

Calculating Current-Period Month, Quarter, and Year-to-Date in Power BI

Marco Russo and Alberto Ferrari show off their intelligence:

Time intelligence functions such as month-to-date (MTD), quarter-to-date (QTD), and year-to-date (YTD) in DAX operate relative to the current filter context. Their outcome depends on the filter applied, making them both adaptable for various periods and useful for comparisons. But, if you wish to showcase the most recent data – for the “current” period – there is a complication: without the proper filter, you may not get the data you aim for.

Read on to see what they mean and how you can avoid this issue.

Comments closed

The Query that Wouldn’t Go Parallel

Reitse Eskens was living in a black-and-white world, smoking at a dilapidated desk in a run-down office in a beat-up city, when she came through the door:

So what’s up this time. Our client has moved to Azure in classic lift and shift scenario. Well, almost. They’ve deployed new VM’s and installed SQL Server 2019 Standard in nice DTAP setting. The VM’s are standard E16-4as-v4 SKU. 4 vcpus and 128 GB memory. The disks are Premium SSD LRS ones with 2300 Max IOPS.

Their on-premises environment was a SQL Server 2016 standard edition running on a virtualisation layer with 128 GB of memory and 8 cores.

In both cases there are 2 numa nodes dividing the amount of cores between them.

Read on to learn more about the problem and what Reitse & co did to resolve it. Also check out the comments—Daniel Hutmacher, in particular, I think has the reason nailed.

Comments closed

Bring-Your-Own-Key in Azure SQL Database

Rod Edwards shares some hard-earned guidance:

Some organisations are more strict on security than others. Thats just the way of the world, whether it be local policy, industry policy, paranoia or worryingly…just not considering it a priority.

This is why Microsoft have to offer BYOK, no, not the famous Icelandic singer from the 90’s and beyond either. I’m (very) tenuously referring to “Bring Your Own Key” which allows customers to let the encryption key to be handled by Microsoft for their encryption purposes, but create and use one of their own.

Read on to learn more about how it works, as well as a couple of important warnings you should keep in mind.

Comments closed

From Probabilities to Odds

Bryan Shalloway explains how odds and probabilities intertwine:

However human understanding of odds predates our formal understanding of probability. You can find references to odds dating back to Shakespeare:

Knew that we ventured on such dangerous seas
That if we wrought out life ’twas ten to one;
– Shakespeare’s Henry IV, Part II, 1597

Yet, in most common settings, modern society has largely supplanted odds for probabilities. You can imagine if Shakespeare were writing today the line might end “’twas ten out of eleven.

Read the whole thing.

Comments closed

Creating and Connecting to an Azure Postgres Cluster

Louis Davidson shares some notes:

As I have dealt with other platforms, PostgreSQL has stood out to me as the platform I am most interested in because it feels like the one that is most competitive with SQL Server’s platform (Oracle is out there too, as is MySQL, and many others, but PostgreSQL feels like the balance of affordability and features that it has a similar feel enough to get started.)

There are a few high-level differences that can be confusing. A cluster is really just a server (or in SQL Server, an instance). Second, the way you execute a batch of code is very different, and sometimes this is based on the tool you are using. As you dig into how PostgreSQL works, some things will feel really normal, and some stuff will be very different from the other servers you have used.

Read on for the first post in the series, covering setup and connection.

Comments closed

Fitting Distributions to Datasets in R

Steven Sanderson tests a distribution fit:

There are two main ways to fit a gamma distribution to a dataset in R:

  1. Maximum likelihood estimation (MLE): This method estimates the parameters of the gamma distribution that are most likely to have produced the observed data.
  2. Method of moments: This method estimates the parameters of the gamma distribution by equating the sample mean and variance to the theoretical mean and variance of the gamma distribution.

Click through to see which technique Steven uses and an example of how it all works.

Comments closed