Press "Enter" to skip to content

Month: April 2024

Removing Rows with Missing Data in R

Steven Sanderson shows us three ways:

Handling missing values is a crucial aspect of data preprocessing in R. Often, datasets contain missing values, which can adversely affect the analysis or modeling process. One common task is to remove rows containing missing values entirely. In this tutorial, we’ll explore different methods to accomplish this task in R, catering to scenarios where we want to remove rows with either some or all missing values.

Click through for three ways to do this.

Comments closed

CI/CD in GitHub

I have a new video:

In this video, I explain what continuous integration (CI) is, disambiguate continuous delivery from continuous deployment (CD), and see how you can perform CI/CD operations using GitHub Actions.

Read on to see what these terms mean and an example of how it all works with .NET projects.

Comments closed

Searching for Files in a Blob Storage Container

Andy Brownsword hits one of my bugbears:

Shifting from handling data on premises to Azure has been a real change of mindset. Whilst what I want to build may be similar, the how part is completely different. There’s a learning curve not just to the tooling but how you use it too.

This is one of those instances.

I had a storage container with files which had a date in their name. I wanted to perform a wildcard search to select some of them. That sounds straight forward, right?

This is unnecessarily painful, especially if you’re trying to find the right full backup in a container filled with full and transaction log backup files. Andy’s solution does work but also requires a full scan of keys. And I don’t think there’s a better way to do it.

Comments closed

Database Delegation with Power Apps

Deborah Melkin shares a note about Power Apps:

When I write a SQL Query, I never have to worry if the way I’m writing a query will automatically create a limit on the number of records being returned if I’m not specifying TOP or OFFSET…FETCH. If my query wants to return 1 million records, it will return 1 million records.

PowerApps doesn’t work that way.

Click through to see how it does work, including some of the limitations around it.

Comments closed

The Importance of Orchestration in E(L)TL Processes

Martin Schoombee begins a new series:

In the context of what we’re talking about throughout this series – facilitating the execution of an ETL process in a platform like Azure Data Factory – orchestration means that we’re using the ETL tool primarily for the “E” (Extract) part of the process. In addition to that, most people I know would also use the ETL tool to facilitate the workflow, in other words the order of execution and any constraints that go along with that.

In what I’d like to call the “traditional” approach for lack of a better term, all parts of the ETL process are performed natively by the tool (image below), using whatever built-in tasks are available and of course accounting for any nuances. With this approach, transformations are typically performed in transit and in memory.

Read on to see how the Orchestration approach differs from the traditional ETL approach.

Comments closed

Creating DAX Measures Directly in the Power BI Service

Gilbert Quevauvilliers skips the desktop step:

There have been some updates to Power BI with Visual Calculations, and as of the March 2024 release you can now create and edit visual calculations in the Power BI Service.

Here are details from the blog post: Power BI March 2024 Feature Summary | Visual Calculations Update

I then saw an email which made me think, could I then use the visual calculations to create a DAX measure directly in the Power BI Service on an existing Power BI report.

The answer, it turns out, is yes. Read on to see how.

Comments closed

Trying out Microsoft Fabric Mirroring of Cosmos DB

Kevin Chant gives it the ol’ college try:

In this post I cover some initial testing of Mirroring Azure Cosmos DB Databases in Microsoft Fabric that I performed.

I wanted to do this post for various reasons. Including the fact that it was announced during the Microsoft Fabric Community Conference that Mirroring is now in Public Preview.

Which means that you can now mirror data from Azure SQL Database, Azure Cosmos DB and Snowflake into your own Microsoft Fabric tenant. Even trial tenants.

Kevin takes us through the process and gives it a try, sharing with us the results of some testing, including a test insertion of 100 million rows.

Comments closed

EXPAND() and COLLAPSE() in Power BI Visual Calculations

Marco Russo and Alberto Ferrari play the accordion:

In previous articles, we introduced the concepts of visual context, the visual lattice, and the two visual context navigation operations: EXPAND and COLLAPSE. In this article, we build on that knowledge to first compute a simple percentage over the parent: a simple calculation to consolidate the understanding of EXPAND and COLLAPSE. Next, we move on to a much more complex scenario, computing the inflation-adjusted sales using visual calculations.

Read on to see what the two functions do, using a pair of examples.

Comments closed

Common Table Expressions in Postgres

Ryan Booz builds a CTE:

In the first article in this transforming data series, I discussed how powerful PostgreSQL can be in ingesting and transforming data for analysis. Over the last few decades, this was traditionally done with a methodology called Extract-Transform-Load (ETL) which usually requires external tools. The goal of ETL is to do the transformation work outside of the database and only import the final form of data that is needed for further analysis and reporting.

However, as databases have improved and matured, there are more capabilities to do much of the raw data transformation inside of the database. In doing so, we flip the process just slightly so that we Extract-Load-Transform (ELT), focusing on getting the raw data into the database and transforming it internally. In many circumstances this can dramatically improve the iteration of development because we can use SQL rather than external tools.

Read on for the scenario and how you can use common table expressions in Postgres. They’re very similar to what we have in SQL Server, though a few differences do exist.

Comments closed