Press "Enter" to skip to content

Author: Kevin Feasel

Ways to Use Sort Order in Bar Charts

Mike Cisneros demands order:

When you’re visualizing categorical data, sorting the bars in your chart is usually a straightforward task. Or is it?

In most cases, you probably take the category with the largest value and stick that in the prime spot, the leftmost slot on the horizontal axis. Then, you proceed from left to right in descending order of value. Easy peasy.

But it’s not always that simple, as Mike points out.

Comments closed

Database Watcher for Azure SQL

Dimitry Furman has a new announcement:

Reliable, in-depth, and at-scale monitoring of database performance has been a long-standing top priority for SQL customers. Today, we are pleased to announce the public preview of database watcher for Azure SQL, a managed database monitoring solution to help our customers use Azure SQL reliably and efficiently.

Click through to see what it offers and what’s on the roadmap for this product.

Comments closed

Calculating Percentages in T-SQL

Edwin Sanchez shows a variety of methods to calculate percentages of the whole in T-SQL:

Calculating percentages in SQL Server is like slicing a pie. You need to know the total size (the denominator) and the size of the slice you want (the numerator). To get a percentage, you divide the slice size by the total size and multiply by 100. 

Read on for a variety of methods to calculate this. I wouldn’t use all of the methods myself, as I have certain predilections against subqueries in the SELECT clause, but they do get the job done.

Comments closed

Using Key Vault in SQL Server on Linux

Aravind Mahadevan shares information on a new bit of functionality:

We’re excited to announce that Extensible Key Management (EKM) using Azure Key Vault in SQL Server on Linux is now generally available from SQL Server 2022 CU12 onwards, which allows you to manage encryption keys outside of SQL Server using Azure Key Vaults.

In this blog post, we’ll explore how to leverage Azure Key Vault as an EKM provider for SQL Server on Linux.

Read on to see how to set this up.

Comments closed

Learning about GitHub Actions

I have a new video:

In this video, we dig into GitHub’s process for executing code: GitHub Actions workflows. We’ll learn what Actions and workflows are, how we can create them from scratch, and how to incorporate Actions from the GitHub Marketplace into our own workflows.

Along the way, I describe what GitHub Actions workflows are and we build a simple one. I’ll have more videos coming up that expand on GitHub Actions and show you more of what you can do with them.

Comments closed

Duplicating Rows in R

Steven Sanderson repeats the punch line a few times:

Are you working with a dataset where you need to duplicate certain rows multiple times? Perhaps you want to create synthetic data by replicating existing observations, or you need to handle imbalanced data by oversampling minority classes. Whatever the reason, replicating rows in a data frame is a handy skill to have in your R programming toolkit.

In this post, we’ll explore how to replicate rows in a data frame using base R functions. We’ll cover replicating each row the same number of times, as well as replicating rows a different number of times based on a specified pattern.

Click through to replicate data without copy-paste.

Comments closed

Changing the Timeout of a Spark Session in Microsoft Fabric

Koen Verbeeck doesn’t have time to wait:

You might know the feeling: you’re writing code in a Notebook in Microsoft Fabric and suddenly you have to leave your workstation for a while. Someone ran the doorbell (you’re working from home and you get some parcels delivered), or you took a coffee break with some colleagues. When you return to your notebook, the Spark session has timed out and when you run a cell, you have to wait for the damn thing to restart again. The agony, waiting for 2-3 minutes for the session to start, and only after that the actual code can start running.

Read on to see how you can set the timeout to a custom value, assuming you’re okay with paying for the Spark cluster to sit around until it times out.

Comments closed

Exposing Kafka Data in Iceberg using Tableflow

Marc Selwan announces a new product:

We’re excited to talk about our vision for Tableflow, which makes it push-button simple to take Apache Kafka® data and feed it directly into your data lake, warehouse, or analytics engine as Apache Iceberg® tables. Making operational data accessible to the analytical world is traditionally a complex, expensive, and brittle process and we believe we can do better to unify the operational and analytical estates.

Tableflow removes all this erroneous, duplicative work and helps convert Kafka topics and associated schemas to Iceberg tables in one click. This is central to our Confluent’s vision to build the world’s leading data streaming platform that fuels any operational and analytical workload with real-time data products. 

It looks like this is currently in early access, but you can see where Confluent intends to take the product.

Comments closed

The Proper Use of Views and Inline UDFs

Erik Darling plays tic-tac-toe:

The problem is really the stuff that people stick into views. They’re sort of like a junk drawer for data. Someone builds a view that returns a correct set of results, which becomes a source of truth. Then someone else comes along and uses that view in another view, because they know it returns the correct results, and so on and so on. Worse, views tend to do a bunch of data massaging, left joining and coalescing and substringing and replacing and case expressioning and converting things to other things. The bottom line is that views are as bad as you make them.

The end result is a trash monster with a query plan that can only be viewed in full from deep space.

Read on to learn the use cases for views and inline UDFs, as well as a few important notes regarding performance of each. Views are like mogwai: they’re fine as long as you never get them wet and never let them eat after midnight. The problem is, far too many companies are apparently the business equivalent of all-you-can-eat buffets at water parks.

Inline user-defined functions are like patenting a device that lets you shoot yourself in both feet with one pull of the trigger. Which, if I understand things correctly, means you’ll need a Form 4 for each inline UDF.

Comments closed