Press "Enter" to skip to content

Curated SQL Posts

Using the cut() Function in R

Steven Sanderson is about to cut somebody:

In the realm of data analysis, understanding how to effectively segment your data is paramount. Whether you’re dealing with age groups, income brackets, or any other continuous variable, the ability to categorize your data can provide invaluable insights. In R, the cut() function is a powerful tool for precisely this purpose. In this guide, we’ll explore how to harness the full potential of cut() to slice and dice your data with ease.

Read on for examples of how to use the cut() function.

Comments closed

Ways to Use Sort Order in Bar Charts

Mike Cisneros demands order:

When you’re visualizing categorical data, sorting the bars in your chart is usually a straightforward task. Or is it?

In most cases, you probably take the category with the largest value and stick that in the prime spot, the leftmost slot on the horizontal axis. Then, you proceed from left to right in descending order of value. Easy peasy.

But it’s not always that simple, as Mike points out.

Comments closed

Using Runbooks in Azure Automation

Rod Edwards has a process for that:

Nobody likes to do the same monontonous task over and over again. Well, saying that, maybe some out there do in order to look and feel busy…but I don’t, as I nearly always have something else more pressing or fun or interesting to do. By automating those repeatable tasks, it reduces boredom, chance of errors, and stress if you’re already a busy bunny.

This is where Automation comes into play, and in Azure we have a few options. This post focuses in Azure automation.

Read on to see how Azure Automation works and how to build a Powershell runbook in it.

Comments closed

Calculating Percentages in T-SQL

Edwin Sanchez shows a variety of methods to calculate percentages of the whole in T-SQL:

Calculating percentages in SQL Server is like slicing a pie. You need to know the total size (the denominator) and the size of the slice you want (the numerator). To get a percentage, you divide the slice size by the total size and multiply by 100. 

Read on for a variety of methods to calculate this. I wouldn’t use all of the methods myself, as I have certain predilections against subqueries in the SELECT clause, but they do get the job done.

Comments closed

Database Watcher for Azure SQL

Dimitry Furman has a new announcement:

Reliable, in-depth, and at-scale monitoring of database performance has been a long-standing top priority for SQL customers. Today, we are pleased to announce the public preview of database watcher for Azure SQL, a managed database monitoring solution to help our customers use Azure SQL reliably and efficiently.

Click through to see what it offers and what’s on the roadmap for this product.

Comments closed

Using Key Vault in SQL Server on Linux

Aravind Mahadevan shares information on a new bit of functionality:

We’re excited to announce that Extensible Key Management (EKM) using Azure Key Vault in SQL Server on Linux is now generally available from SQL Server 2022 CU12 onwards, which allows you to manage encryption keys outside of SQL Server using Azure Key Vaults.

In this blog post, we’ll explore how to leverage Azure Key Vault as an EKM provider for SQL Server on Linux.

Read on to see how to set this up.

Comments closed

Duplicating Rows in R

Steven Sanderson repeats the punch line a few times:

Are you working with a dataset where you need to duplicate certain rows multiple times? Perhaps you want to create synthetic data by replicating existing observations, or you need to handle imbalanced data by oversampling minority classes. Whatever the reason, replicating rows in a data frame is a handy skill to have in your R programming toolkit.

In this post, we’ll explore how to replicate rows in a data frame using base R functions. We’ll cover replicating each row the same number of times, as well as replicating rows a different number of times based on a specified pattern.

Click through to replicate data without copy-paste.

Comments closed

Learning about GitHub Actions

I have a new video:

In this video, we dig into GitHub’s process for executing code: GitHub Actions workflows. We’ll learn what Actions and workflows are, how we can create them from scratch, and how to incorporate Actions from the GitHub Marketplace into our own workflows.

Along the way, I describe what GitHub Actions workflows are and we build a simple one. I’ll have more videos coming up that expand on GitHub Actions and show you more of what you can do with them.

Comments closed

Exposing Kafka Data in Iceberg using Tableflow

Marc Selwan announces a new product:

We’re excited to talk about our vision for Tableflow, which makes it push-button simple to take Apache Kafka® data and feed it directly into your data lake, warehouse, or analytics engine as Apache Iceberg® tables. Making operational data accessible to the analytical world is traditionally a complex, expensive, and brittle process and we believe we can do better to unify the operational and analytical estates.

Tableflow removes all this erroneous, duplicative work and helps convert Kafka topics and associated schemas to Iceberg tables in one click. This is central to our Confluent’s vision to build the world’s leading data streaming platform that fuels any operational and analytical workload with real-time data products. 

It looks like this is currently in early access, but you can see where Confluent intends to take the product.

Comments closed