Press "Enter" to skip to content

Author: Kevin Feasel

An Overview of Data Lake Operations with Apache NiFi

Lav Kumar gives us a 10,000 foot view:

In the world of data-driven decision-making, ETL (Extract, Transform, Load) processes play a pivotal role. The effective management and transformation of data are essential to ensure that businesses can make informed choices based on accurate and relevant information. Data lakes have emerged as a powerful way to store and analyze massive amounts of data, and Apache NiFi is a robust tool for streamlining ETL processes in a data lake environment.

Read on for a brief primer on NiFi and how some of its capabilities can assist in ETL and ELT processing.

Comments closed

Moving VMs and Disks between Azure Tenants

Dennes Torres makes a move:

Move objects on Azure is not simple. Move between Tenants is extremely difficult or not possible. I faced the challenge to move a virtual machine and disks between tenants recently and found the solution.

Some Years ago, I wrote an article about the Azure Resource Mover when it was still being created. Today the resource mover is integrated with the entire azure portal, although there are already many limitations in relation to moving resources. Anyway, this will not affect us on this blog post.

Click through for the step-by-step, as well as a few gotchas you might run into along the way.

Comments closed

The Benefits of Checklists

Aaron Bertrand checks a box:

If there has been one constant throughout my career, it’s change. As applications become more complex and we continue improving reliability, there will always be the next patch, upgrade, new replica, new cluster, and even new cloud region – or moving to the cloud in general. For complex architectures, multiple teams are often actively involved, and even more who want to be “in the know” during any changes.

We use tickets (JIRA) to track and document the work, and incidents (FireHydrant) to expose the status to internal and external customers. But these are complex systems to keep current in real-time. And while nearly everything we do is scripted, broad audiences can’t consume code – even when saturated with comments. Since multiple teams are involved, the code is scattered across disparate things like runbooks, which are not easy or desirable to combine. How can a wide range of people stay coordinated during a major change?

For more complicated tasks, I’m all-in on creating either checklists or dedicated runbooks. I have a client that uses merge replication, and every once in a while, we need to rebuild replication. In that case, we have a more detailed runbook with step-by-step instructions, but this is great for keeping track of complex processes, whether or not they go cross-team.

Also, callout to the greatest Site Reliability Engineer ever to play the game, Mario Lemieux.

Comments closed

An Overview of 4th Normal Form

I continue a series on database normalization:

In this video, [I] explain what Fourth Normal Form (4NF) is and why I consider 5NF to be significantly more important. Even so, 4NF does make it easy to explain a certain common class of problem, allowing it to provide some measure of utility.

4th Normal Form is a special case of the much more exciting 5th Normal Form, but I do have a bit of a soft spot for it.

Comments closed

Uniform Random Number Generation in R

Steven Sanderson digs into the uniform distribution:

Randomness is an essential part of many statistical and machine learning tasks. In R, there are a number of functions that can be used to generate random numbers, but the runif() function is the most commonly used.

Something mildly embarrassing for me is that it took me a while to figure out why they call the command runif(). That’s because, at first, I didn’t pronounce it r unif but rather run if.

In reality, *unif() means “uniform distribution” and r stands for “random number.” There are several other functions based on the uniform distribution and Steven looks at those as well in this post.

Comments closed

Formatting DAX Expressions with Python

Sandeep Pawar makes the code a bit more readable:

There is an old Italian saying “If it’s not formatted, it is not DAX

When you get the list of measures from SemPy, it’s not formatted and is hard to read and understand. Thankfully, the SQLBI team has made the DAX parser and the formatter available via an API. I wrote a quick function to return the formatted DAX expression of a measure. You can either pass a DAX expression or the FabricDataFrame returned by fabric.list_measures()

Click through for the process, including the Python code to do the work.

Comments closed

Pulling XMLA-Modified Power BI Datasets into Source Control

Marc Lelijveld has a fix:

Have you ever found yourself stuck with a modified Power BI dataset, thanks to those well-intentioned but troublesome changes you made through the XMLA endpoint? Does that sound familiar to you? What seemed like a convenient solution quickly turned into a frustrating challenge when you encountered the error message in the Power BI Service.

You wanted to seamlessly continue your development journey in Power BI Desktop, avoiding the need for a full data refresh or just quickly making that one small change, but now hitting a roadblock when trying to download PBIX file. The error message declared that your data model had been modified with the XMLA endpoint. But now, with Git integration you can overcome this challenge!

Read on to see how.

Comments closed

An Overview of the Current State of Microsoft Fabric

Paul Andrew pulls no punches:

Despite playing with different parts of the Fabric ecosystem for a long time. Nothing ever prepares you for the challenges and “quirks” faced when building a solution for real. In this post I’ll call out some of the pain points we’ve faced and features of the product still requiring improvement. Excluding some of the obvious gaps in the product like security, that we know to be coming.

Read on for Paul’s analysis on what Fabric is currently missing, but as you do read it, keep in mind that this is still in public preview and even after it goes GA, Microsoft will continue development on Fabric.

Comments closed

An Analysis of Goal Line Runs out of Shotgun

I decided to test a common narrative:

A common theme among Buffalo Bills fans is the idea that the Bills run too many plays out of shotgun near the opposing team’s goal line, and this is hampering their ability to score points. Instead, these fans argue, they should run from under center, either a direct handoff or a quarterback sneak. If you were to press fans on this, I believe you’d also hear that the Bills are unique, or at least uniquely bad, at running such plays.

I’m going to use the nflfastR package to analyze play-by-play data and see just how well this bit of fan wisdom holds up.

Spoiler alert: it doesn’t.

Comments closed