Press "Enter" to skip to content

Day: May 9, 2024

Counting NA Values in R

Steven Sanderson counts what doesn’t exist:

Welcome back, R enthusiasts! Today, we’re going to explore a fundamental task in data analysis: counting the number of missing (NA) values in each column of a dataset. This might seem straightforward, but there are different ways to achieve this using different packages and methods in R.

Let’s dive right in and compare how to accomplish this task using base R, dplyr, and data.table. Each method has its own strengths and can cater to different preferences and data handling scenarios.

Read on for 3 1/2 separate methods.

Leave a Comment

Building Workers in Azure Data Factory

Martin Schoombee continues a series on orchestration in Azure Data Factory:

We’re finally ready to dive into the Data Factory components that form part of the framework, and we’re going to work our way from the bottom up. To paraphrase the previous blog post, worker pipelines perform the actual work of either moving data (from source to staging) or executing a stored procedure that will load a dimension/fact table.

Although worker pipelines can contain any number of tasks you may need, my worker pipelines that move data from a source system into the staging area follow a similar pattern with at least the following activities:

Click through for that list, as well as more information.

Leave a Comment

Editing the JSON of a Microsoft Fabric Pipeline

Dennes Torres makes a change:

A Fabric Pipeline uses JSON as source code. They are also saved in repositories as JSON.

We first idea we get is editing the pipeline in JSON format. We can copy the JSON and create new pipelines with small variations, making changes directly on the JSON.

However, at first sight we get disappointed, because the pipeline doesn’t allow the JSON to be edited. We have the option to view the JSON, but nothing else.

Read on to see how to tell the Fabric pipeline who’s boss.

Leave a Comment

Avoid OPENJSON in Cursor Declarations

Joe Obbish has a public service announcement:

I found a few stored procedures with single plan cache entries over 500 MB of memory each according to sys.dm_exec_cached_plans while optimizing the plan cache in production. The stored procedures were relatively simple and both execution plans were less than 1 MB when saved as sqlplan files. I was confused, offended, and confused.

Read on to learn how this happened, see an example of it in action, and how you can avoid it in your own environment.

Leave a Comment