Press "Enter" to skip to content

Author: Kevin Feasel

Using the cut() Function in R

Steven Sanderson is about to cut somebody:

In the realm of data analysis, understanding how to effectively segment your data is paramount. Whether you’re dealing with age groups, income brackets, or any other continuous variable, the ability to categorize your data can provide invaluable insights. In R, the cut() function is a powerful tool for precisely this purpose. In this guide, we’ll explore how to harness the full potential of cut() to slice and dice your data with ease.

Read on for examples of how to use the cut() function.

Comments closed

Using Runbooks in Azure Automation

Rod Edwards has a process for that:

Nobody likes to do the same monontonous task over and over again. Well, saying that, maybe some out there do in order to look and feel busy…but I don’t, as I nearly always have something else more pressing or fun or interesting to do. By automating those repeatable tasks, it reduces boredom, chance of errors, and stress if you’re already a busy bunny.

This is where Automation comes into play, and in Azure we have a few options. This post focuses in Azure automation.

Read on to see how Azure Automation works and how to build a Powershell runbook in it.

Comments closed

Ways to Use Sort Order in Bar Charts

Mike Cisneros demands order:

When you’re visualizing categorical data, sorting the bars in your chart is usually a straightforward task. Or is it?

In most cases, you probably take the category with the largest value and stick that in the prime spot, the leftmost slot on the horizontal axis. Then, you proceed from left to right in descending order of value. Easy peasy.

But it’s not always that simple, as Mike points out.

Comments closed

Database Watcher for Azure SQL

Dimitry Furman has a new announcement:

Reliable, in-depth, and at-scale monitoring of database performance has been a long-standing top priority for SQL customers. Today, we are pleased to announce the public preview of database watcher for Azure SQL, a managed database monitoring solution to help our customers use Azure SQL reliably and efficiently.

Click through to see what it offers and what’s on the roadmap for this product.

Comments closed

Calculating Percentages in T-SQL

Edwin Sanchez shows a variety of methods to calculate percentages of the whole in T-SQL:

Calculating percentages in SQL Server is like slicing a pie. You need to know the total size (the denominator) and the size of the slice you want (the numerator). To get a percentage, you divide the slice size by the total size and multiply by 100. 

Read on for a variety of methods to calculate this. I wouldn’t use all of the methods myself, as I have certain predilections against subqueries in the SELECT clause, but they do get the job done.

Comments closed

Using Key Vault in SQL Server on Linux

Aravind Mahadevan shares information on a new bit of functionality:

We’re excited to announce that Extensible Key Management (EKM) using Azure Key Vault in SQL Server on Linux is now generally available from SQL Server 2022 CU12 onwards, which allows you to manage encryption keys outside of SQL Server using Azure Key Vaults.

In this blog post, we’ll explore how to leverage Azure Key Vault as an EKM provider for SQL Server on Linux.

Read on to see how to set this up.

Comments closed

Learning about GitHub Actions

I have a new video:

In this video, we dig into GitHub’s process for executing code: GitHub Actions workflows. We’ll learn what Actions and workflows are, how we can create them from scratch, and how to incorporate Actions from the GitHub Marketplace into our own workflows.

Along the way, I describe what GitHub Actions workflows are and we build a simple one. I’ll have more videos coming up that expand on GitHub Actions and show you more of what you can do with them.

Comments closed

Duplicating Rows in R

Steven Sanderson repeats the punch line a few times:

Are you working with a dataset where you need to duplicate certain rows multiple times? Perhaps you want to create synthetic data by replicating existing observations, or you need to handle imbalanced data by oversampling minority classes. Whatever the reason, replicating rows in a data frame is a handy skill to have in your R programming toolkit.

In this post, we’ll explore how to replicate rows in a data frame using base R functions. We’ll cover replicating each row the same number of times, as well as replicating rows a different number of times based on a specified pattern.

Click through to replicate data without copy-paste.

Comments closed

Changing the Timeout of a Spark Session in Microsoft Fabric

Koen Verbeeck doesn’t have time to wait:

You might know the feeling: you’re writing code in a Notebook in Microsoft Fabric and suddenly you have to leave your workstation for a while. Someone ran the doorbell (you’re working from home and you get some parcels delivered), or you took a coffee break with some colleagues. When you return to your notebook, the Spark session has timed out and when you run a cell, you have to wait for the damn thing to restart again. The agony, waiting for 2-3 minutes for the session to start, and only after that the actual code can start running.

Read on to see how you can set the timeout to a custom value, assuming you’re okay with paying for the Spark cluster to sit around until it times out.

Comments closed