Press "Enter" to skip to content

Author: Kevin Feasel

Creating Error Bars in ggplot2

Zhenguo Zhang draws a chart:

Sometimes you may want to create a plot with the following features:

  • a point to indicate the mean of a group
  • error bars to indicate the standard deviation of the group
  • and each group may have subgroups, which are represented by different colors.

In this post, I will show you how to create such a plot using the ggplot2 package in R.

Read on for the demonstration, as well as fixing a common problem of overlapping data points. H/T R-Bloggers.

Leave a Comment

Kafka Data Exploration with Tableflow

Robin Moffatt does some exploratory data analysis:

One of the challenges that I’d always had when it came to building streaming data pipelines is that once data is in a Kafka topic, it becomes trickier to query. Whether limited by the available tools to do this or the speed of access, querying Kafka is just not a smooth experience.

This blog post will show you a really nice way of exploring and validating data in Apache Kafka®. We’ll use Tableflow to expose the Kafka topics as Apache Iceberg™️ tables and then query them using standard SQL tools.

Click through for the demonstration using a real dataset.

Leave a Comment

SQL Server Performance Office Hours

Erik Darling is answering questions again:

My company (using SQL Server 2019 Standard) has an algorithm that keys addresses into a varchar(40) has a cross-reference database that assigns an identity property to each new value, allowing us to post the numeric in our datasets. Production has to search their generated string keys in this database’s table to get the integer key in return. We have ensured proper string data typing on lookups and have unique compressed indexes on the string values. What would your next tuning step be if this was not getting the performance you needed?

There’s a good set of questions this time, so click through for Erik’s answers.

Leave a Comment

From Power BI Premium Capacity to Fabric Capacity

Jon Vöge performs a migration:

So your old Power BI Premium Capacity has run/is running out, and your organization is acquiring a new Fabric Capacity to replace it.

Perhaps the organization even decided to take the chance to move the capacity region to something a little closer to home?

If you find yourself in this situation, how do you best migrate your contents of one Capacity to another?

Read on as Jon explains the migration process within a region (which is very easy) and the migration process if you need to go cross-region (which is rather cumbersome).

Leave a Comment

Executing a Fabric Data Pipeline from Azure Data Factory

Koen Verbeeck leaves the confines of Microsoft Fabric:

In the blog post Call a Fabric REST API from Azure Data Factory I explained how you can call a Fabric REST API endpoint from Azure Data Factory (or Synapse if you will). Let’s go a step further and execute a Fabric Data Pipeline from an ADF pipeline, which is a common request. A Fabric capacity cannot auto-resume, so you typically have an ADF pipeline that starts the Fabric capacity. After the capacity is started, you want to kick-off your ETL pipelines in Fabric and now you can do this from ADF as well.

Click through for the process. Though do check the warnings that Koen offers around either spending extra money by remaining in synchronous execution mode, or always getting a positive result in asynchronous execution mode, regardless of whether the underlying Fabric Data Pipeline worked or not.

Leave a Comment

Temp Table Bugs in Microsoft Fabric Warehouses

Jared Westover runs into a wall:

I was excited when Microsoft announced the ability to create session-scoped temporary tables in a Fabric warehouse. However, after using Microsoft Fabric temporary tables, I quickly felt disappointed. When will they be ready for prime time, and in the meantime, what other options are available?

Click through for Jared’s experience, although it might already be fixed.

Leave a Comment

Restoring Databases using Change Data Capture

Tim Radney restores a database:

I have recently had the privilege of working with multiple clients who have been taking advantage of Change Data Capture “CDC”. Change Data Capture is a feature that utilizes SQL Server Agent to log inserts, updates, and deletes occurring in a table. It makes these data changes available to be consumed in a relational format. These changes are captured for the modified rows and stored in change tables that mirror the column structure of the tracked source tables.

Change Data Capture is very often utilized for ETL projects or for data extracts into a data warehouse, data lake, etc.

Click through for the scenario and how you can keep CDC going after restoring a SQL Server database.

Leave a Comment

Digging into the Kusto Detective Agency

Tom Zika becomes a gumshoe:

We need to answer this question:

Who is the detective that earned the most money in 2022?

We can see that only one table (DetectiveCases) was added in the ingestion section. Let’s take a look at its data.

The Kusto Detective Agency is a great way to apply KQL skills. I’m not sure it’s a fantastic experience for somebody with zero KQL knowledge, but if you’ve messed around at least a little bit with the language, this is a fun way of applying those skills.

Leave a Comment

Incremental Backups in PostgreSQL 17

Dave Stokes takes a backup:

 The old adage that a DBA or SRE is only as good as their last backup is true.  PostgreSQL 17 added the ability to combine multiple incremental backups with a full backup to provide a complete data dictionary to recover a failed system. It is very easy to use.

This is a quick example of using incrementals. I recommend watching this video by Robert Haas for more details and some discussion of backup strategies.

Click through to see how it all works. But also please test your backups, because if I were to modify the adage, I’d say that a DBA is only as good as his last tested backup.

Leave a Comment