Press "Enter" to skip to content

Day: August 24, 2023

Adding Mean to Box Plots in R

Steven Sanderson tracks the sixth number of a five-number summary:

Data visualization is a powerful tool for understanding and interpreting data. In this blog post, we will explore how to create box plots with mean values using both base R and ggplot2. We will use the famous iris dataset as an example. So, grab your coding tools and let’s dive into the world of box plots!

Note that this is mean in addition to median in these visuals, not replacing the median.

Comments closed

A Brief Overview of 21 ETL Tools in Python

Adron Hall makes a list:

Here are summaries of each of the tools you’ve mentioned along with examples of how to implement the ETL (Extract, Transform, Load) process using each tool within a Python workflow:

  1. Apache Spark: Apache Spark is a powerful open-source cluster-computing framework that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. It’s commonly used for processing large-scale data and running complex ETL pipelines. Example Implementation:

Read on for summaries and samples for each of the 21 options.

Comments closed

Power BI Licensing Calculator

Stephanie Bruno figures out the cost of Power BI using a Power BI report:

Figuring out what licensing model your organization should choose for your Power BI needs has always been a bit of a challenge, and now with Fabric it’s getting even more complicated. Choosing between using all Pro licenses, Premium Per User, or a dedicated capacity, based on factors like the number of developers and the number of content consumers, dataset sizes, etc. can result in some messy spreadsheets. To try and simplify these calculations, we’ve created the Power BI Licensing Calculator. Just enter the inputs and you’ll be provided with a licensing recommendation.

Click through for a link to the calculator.

Comments closed

Storing Log Analytics Queries in Azure Blob Storage

Gilbert Quevauvilliers wants some long-term storage:

Following on in my series, in this blog post I am going to demonstrate how to store Log Analytics Queries in Blob Storage.

This allows me to be able to store the Power BI Queries externally from Log Analytics and to have an easy way to get the data into my Fabric Lake house in later steps. To do this I am going to use a Logic App in Azure.

In this series I am going to show you all the steps I did to have the successful outcome I had with my client.

Read on to see what Gilbert used for the task.

Comments closed

Contrasting Azure Synapse Analytics and Microsoft Fabric

Warner Chaves explains the difference:

In the modern era of data-driven decision-making, businesses rely heavily on robust and efficient data platforms to process, analyze, and derive insights from their vast amounts of data. Since 2019, Azure Synapse Analytics has been Microsoft’s main contender in this space, offering powerful capabilities to handle complex data workloads.

Now, Microsoft has announced a new data platform called Microsoft Fabric, an evolution of the data platform built with a modified philosophy. It is a similar product but with enough differences to make them not interchangeable and so it’s very important to understand how they both compare and contrast if you’re planning a new data platform deployment. Microsoft wanted a product that was even simpler to deploy and operate and could function well outside of an Azure cloud environment as a full standalone Software As a Service offering.

In this blog post, we’ll compare Synapse Analytics and Fabric, highlighting their features, strengths, and considerations to help you make an informed decision for your organization’s data needs.

Warner has seven main areas of comparison, so click through to see how the two products stack up.

Comments closed

Delete Empty Folders with Powershell

Patrick Gruenauer tidies up:

Big Data? Pain? Looking for empty folders and want to delete them? In this post I show you how to proceed to find and delete empty folders.

Open PowerShell, ISE or VS Code.

Caution: If you proceed, all empty folders will be deleted without any warning.

It is kind of funny to warn people that, if they run the script to delete all of these empty folders, they will delete all of these empty folders. But hey, better safe than sorry.

Comments closed

TaskFactory Activation on an Azure-SSIS Integration Runtime

Andy Leonard does some sleuthing:

I regularly help customers migrate SSIS to Azure-SSIS integration runtimes, a nifty component of Azure Data Factory. I was recently stumped by an error activating TaskFactory (Task Factory for the search engines…) on an Azure-SSIS IR node. The error was:

“The system cannot find the file specified.”

Read on to figure out where the file is and how to fix this error.

Comments closed

Persisting Data for SQL Server on Docker Swarm

Andrew Pruski saves the day, or at least the data:

In my last couple of blog posts (here and here) I talked about how to get SQL Server running in Docker Swarm. But there is one big (and show-stopping) issue that I have not covered. How do we persist data for SQL Server in Docker Swarm?

Docker Swarm, like Kubernetes, has no native method to persist data across nodes…so we need another option and one of the options available to us is Portworx.

So how can we use Portworx to persist SQL Server databases in the event of a node failure in Docker Swarm?

Read on to find out how.

Comments closed