Press "Enter" to skip to content

Month: August 2023

Visualizing Univariate Data Distributions in R

Steven Sanderson reviews the shape of the data:

Understanding the distribution of your data is a fundamental step in any data analysis process. It gives you insights into the spread, central tendency, and overall shape of your data. In this blog post, we’ll explore two popular functions in R for visualizing data distribution: density() and hist(). We’ll use the classic Iris dataset for our examples. Additionally, we will introduce the {TidyDensity} library and show how it can be used to create distribution plots.

Click through for three different functions for visualizing the density of a variable.

Comments closed

Loading OpenStreetMap Data in Postgres

Ryan Lambert gets just the right amount of data:

Populating a PostGIS database with OpenStreetMap data is favorite way to start a new geospatial project. Loading a region of OpenStreetMap data enables you with data ranging from roads, buildings, water features, amenities, and so much more! The breadth and bulk of data is great, but it can turn into a hinderance especially for projects focused on smaller regions. This post explores how to use PgOSM Flex with custom layersets, multiple schemas, and osmium. The goal is load limited data for a larger region, while loading detailed data for a smaller, target region.

The larger region for this post will be the Colorado extract from Geofabrik. The smaller region will be the Fort Collins area, extracted from the Colorado file. The following image shows the data loaded in this post with two maps side-by-side. The minimal data loaded for all of Colorado is shown on the left and the full details of Fort Collins is on the right.

Click through for more details on these two examples.

Comments closed

Advanced Scenarios for Private Endpoints to Azure SQL MI

Zoran Rilak digs in:

In the previous installment of this mini-series, we covered basic scenarios involving private endpoints. If you aren’t familiar with private endpoints and Private Link in general, it might be a good idea to quickly review them to get the feel of how they apply when Azure SQL Managed Instance is in the mix.

In this article, we’ll dive into more involved scenarios that build on those from last week:

5. Hub and spoke topology

6. Partner or ISV giving access to their customers

7. Two SQLs talking to each other: linked server, transactional replication

8. Failover group listener using private endpoints

Read on for architecture diagrams and descriptions for each of these scenarios.

Comments closed

Testing Row-Level Security in Power BI

Wolfgang Strasser puts on the Mission Impossible face replacement mask:

Long time, no Power BI blog post from my side. But today I found out, that the testing of your row-level-security (RLS) logic in the Power BI service changed “a little bit” since I last used it.

Whenever you want to test your RLS logic, you can do this in Power BI Desktop (Mange Roles for definition, “View as” for testing).

Click through for an example of how this works. I like this approach a lot because the people who are developing these reports usually have access to everything, so it’s hard to ensure that you got everything right until people start complaining.

Comments closed

Git Integration for Power BI Reports in Microsoft Fabric

Kevin Chant gives GIt integration a try:

To manage expectations, this post covers:

  1. Brief overview of Microsoft Fabric Git integration.
  2. How I converted a Power BI report to a Power BI Desktop project containing metadata files.
  3. Converting the folder that contains the Desktop project into a Git repository.
  4. Synchronizing the Git repository with Azure DevOps.
  5. Setting up Microsoft Fabric Git integration.
  6. Initial tests.
  7. Interesting workaround to deploy a second Power BI report using metadata.

Read on for Kevin’s thoughts.

Comments closed

CPU Threads in SQL Server Backups

Andy Yun dives in:

Welcome back to Part 3 of my SQL Server Backup Internals Series.

In Part 1, I introduced the “parts” of a BACKUP Operation and in Part 2, we delved into Backup Buffers. Today, we’re going to talk about what manipulates those Backup Buffers = CPU Threads. This’ll be a longer blog, so go refill your coffee now.

Andy did an outstanding job explaining what reader and writer threads do and how SQL Server picks the numbers of each.

Comments closed

Adding Mean to Box Plots in R

Steven Sanderson tracks the sixth number of a five-number summary:

Data visualization is a powerful tool for understanding and interpreting data. In this blog post, we will explore how to create box plots with mean values using both base R and ggplot2. We will use the famous iris dataset as an example. So, grab your coding tools and let’s dive into the world of box plots!

Note that this is mean in addition to median in these visuals, not replacing the median.

Comments closed

A Brief Overview of 21 ETL Tools in Python

Adron Hall makes a list:

Here are summaries of each of the tools you’ve mentioned along with examples of how to implement the ETL (Extract, Transform, Load) process using each tool within a Python workflow:

  1. Apache Spark: Apache Spark is a powerful open-source cluster-computing framework that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. It’s commonly used for processing large-scale data and running complex ETL pipelines. Example Implementation:

Read on for summaries and samples for each of the 21 options.

Comments closed

Power BI Licensing Calculator

Stephanie Bruno figures out the cost of Power BI using a Power BI report:

Figuring out what licensing model your organization should choose for your Power BI needs has always been a bit of a challenge, and now with Fabric it’s getting even more complicated. Choosing between using all Pro licenses, Premium Per User, or a dedicated capacity, based on factors like the number of developers and the number of content consumers, dataset sizes, etc. can result in some messy spreadsheets. To try and simplify these calculations, we’ve created the Power BI Licensing Calculator. Just enter the inputs and you’ll be provided with a licensing recommendation.

Click through for a link to the calculator.

Comments closed

Storing Log Analytics Queries in Azure Blob Storage

Gilbert Quevauvilliers wants some long-term storage:

Following on in my series, in this blog post I am going to demonstrate how to store Log Analytics Queries in Blob Storage.

This allows me to be able to store the Power BI Queries externally from Log Analytics and to have an easy way to get the data into my Fabric Lake house in later steps. To do this I am going to use a Logic App in Azure.

In this series I am going to show you all the steps I did to have the successful outcome I had with my client.

Read on to see what Gilbert used for the task.

Comments closed