Press "Enter" to skip to content

Day: November 1, 2022

Difficulties around A/B Testing

John Cook asks which is clearer, 1 or 2? 3 or 4? 4 or 6?

One problem with A/B testing is that your results may depend on the order of your tests.

Suppose you’re testing three options: XY, and Z. Let’s say you have three market segments, equal in size, each with the following preferences.

This is known as the Condorcet paradox of voting.

John also introduces the problem of interaction effects:

Suppose you’re debating between putting a photo of a car or a truck on your web site, and you’re debating between whether the vehicle should be red or blue. You decide to use A/B testing, so you test whether customers prefer a red truck or a blue truck. They prefer the blue truck. Then you test whether customers prefer a blue truck or a blue car. They prefer the blue truck.

Maybe customers would prefer a red car best of all, but you didn’t test that option. By testing vehicle type and color separately, you didn’t learn about the interaction of vehicle type and color. 

Click through for both posts as well as some good insights.

Comments closed

Working with strcat in KQL

Robert Cain has a post dedicated to the strcat() function in KQL:

The strcat function has been shown in previous articles, but it’s so useful it deserves a post all of its own.

As usual, the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

Read on to (re-)learn the power of string concatenation, in Kusto form.

Comments closed

Parallel Loading of Tables in Power BI Dataset Refresh

Chris Webb hits the turbo button:

Do you have a a large dataset in Power BI Premium or Premium Per User? Do you have more than six tables that take a significant amount of time to refresh? If so, you may be able to speed up the performance of your dataset’s refresh by increasing the number of tables that are refreshed in parallel, using a feature that was released in August 2022 but which you may have missed.

Click through for that tip.

Comments closed

Azure Synapse Analytics R Language Support

Ryan Majidimehr has a short list of updates for Azure Synapse Analytics but it includes a big one:

Azure Synapse Analytics provides built-in R support for Apache Spark. As part of this, data scientists can leverage Azure Synapse Analytics notebooks to write and run their R code. This also includes support for SparkR and SparklyR, which allows users to interact with Spark using familiar Spark or R interfaces. To learn more read the official how-to Use R for Apache Spark with Azure Synapse Analytics (Preview).

That it took this long for R support was a bit weird, but I’m glad it’s there now.

Comments closed

One Repo for Every Environment

Meagan Longoria explains an important part of source control repositories:

I’ve seen a few people start Azure Data Factory (ADF) projects assuming that we would have one source control repo per environment, meaning that you would attach a Git repo to Dev, and another Git repo to Test and another to Prod.

Microsoft recommends against this, saying:

Read on for the citation as well as the practical reason why we don’t want multiple repos. This is true not only for Azure Data Factory but for every development project. You have one repository with branches. Certain branches represent checkpoints where code goes out to a specific environment via use of a release tool (e.g., Azure DevOps release pipelines, GitHub actions, etc.).

Comments closed