Press "Enter" to skip to content

Day: April 29, 2025

Creating Error Bars in ggplot2

Zhenguo Zhang draws a chart:

Sometimes you may want to create a plot with the following features:

  • a point to indicate the mean of a group
  • error bars to indicate the standard deviation of the group
  • and each group may have subgroups, which are represented by different colors.

In this post, I will show you how to create such a plot using the ggplot2 package in R.

Read on for the demonstration, as well as fixing a common problem of overlapping data points. H/T R-Bloggers.

Leave a Comment

Kafka Data Exploration with Tableflow

Robin Moffatt does some exploratory data analysis:

One of the challenges that I’d always had when it came to building streaming data pipelines is that once data is in a Kafka topic, it becomes trickier to query. Whether limited by the available tools to do this or the speed of access, querying Kafka is just not a smooth experience.

This blog post will show you a really nice way of exploring and validating data in Apache Kafka®. We’ll use Tableflow to expose the Kafka topics as Apache Iceberg™️ tables and then query them using standard SQL tools.

Click through for the demonstration using a real dataset.

Leave a Comment

SQL Server Performance Office Hours

Erik Darling is answering questions again:

My company (using SQL Server 2019 Standard) has an algorithm that keys addresses into a varchar(40) has a cross-reference database that assigns an identity property to each new value, allowing us to post the numeric in our datasets. Production has to search their generated string keys in this database’s table to get the integer key in return. We have ensured proper string data typing on lookups and have unique compressed indexes on the string values. What would your next tuning step be if this was not getting the performance you needed?

There’s a good set of questions this time, so click through for Erik’s answers.

Leave a Comment

From Power BI Premium Capacity to Fabric Capacity

Jon Vöge performs a migration:

So your old Power BI Premium Capacity has run/is running out, and your organization is acquiring a new Fabric Capacity to replace it.

Perhaps the organization even decided to take the chance to move the capacity region to something a little closer to home?

If you find yourself in this situation, how do you best migrate your contents of one Capacity to another?

Read on as Jon explains the migration process within a region (which is very easy) and the migration process if you need to go cross-region (which is rather cumbersome).

Leave a Comment

Temp Table Bugs in Microsoft Fabric Warehouses

Jared Westover runs into a wall:

I was excited when Microsoft announced the ability to create session-scoped temporary tables in a Fabric warehouse. However, after using Microsoft Fabric temporary tables, I quickly felt disappointed. When will they be ready for prime time, and in the meantime, what other options are available?

Click through for Jared’s experience, although it might already be fixed.

Leave a Comment

Executing a Fabric Data Pipeline from Azure Data Factory

Koen Verbeeck leaves the confines of Microsoft Fabric:

In the blog post Call a Fabric REST API from Azure Data Factory I explained how you can call a Fabric REST API endpoint from Azure Data Factory (or Synapse if you will). Let’s go a step further and execute a Fabric Data Pipeline from an ADF pipeline, which is a common request. A Fabric capacity cannot auto-resume, so you typically have an ADF pipeline that starts the Fabric capacity. After the capacity is started, you want to kick-off your ETL pipelines in Fabric and now you can do this from ADF as well.

Click through for the process. Though do check the warnings that Koen offers around either spending extra money by remaining in synchronous execution mode, or always getting a positive result in asynchronous execution mode, regardless of whether the underlying Fabric Data Pipeline worked or not.

Leave a Comment