Curated SQL – Page 420 – A Fine Slice Of SQL Server

Creating Pareto Charts in R with qcc

Published 2023-10-23 by Kevin Feasel

A Pareto chart is a type of bar chart that shows the frequency of different categories in a dataset, ordered by frequency from highest to lowest. It is often used to identify the most common problems or causes of a problem, so that resources can be focused on addressing them.

To create a Pareto chart in R, we can use the qcc package. The qcc package provides a number of functions for quality control, including the pareto.chart() function for creating Pareto charts.

Manufacturing companies love Pareto charts

Comments closed

Several Useful R Functions

Published 2023-10-23 by Kevin Feasel

Maelle Salmon shows off four useful R functions:

Recently I caught myself using which(grepl(...)),
animals <- c("cat", "bird", "dog", "fish")
which(grepl("i", animals))
#> [1] 2 4
when the simpler alternative is
animals <- c("cat", "bird", "dog", "fish")
grep("i", animals)
#> [1] 2 4

Read on for another example of using grep() instead of grepl(), as well as three other functions you might want to keep in mind. H/T R-Bloggers.

Comments closed

Exploring Poker Hands in R

Published 2023-10-23 by Kevin Feasel

Benjamin Smith sorts and deals:

Recently, I have been reading “Mathematical Statistics” by Professor Keith Knight and I noticed a interesting passage he mentions when discussing finite sample spaces:

*In some cases, it may be possible to enumerate all possible outcomes, but in general such enumeration is physically impossible; for example, enumerating all possible 5 card poker hands dealt from a deck of 52 cards would take several months under the most
favourable conditions. * (Knight 2000)

While this quote is taken out of context, with the advent of modern computing this is a task which is definitely possible to do computationally!

Click through to see how you can do this in R, at least for 5-card stud. 5-card draw would have the same number of final combinations, though if you also track intermediary combinations, it would grow rather considerably.

Comments closed

Microsoft Fabric’s Reflex as Watchdog

Published 2023-10-23 by Kevin Feasel

Tom Martens brings home a junkyard dog:

Reflex is many things next to one of the workloads of Microsoft Fabric. Before I delve into these things in more detail in later articles (yes, maybe this is the birth of another series of articles), I want to say this: Reflex is cool. It was never that simple to watch your data in your Power BI datasets (and this is only one of the capabilities of Reflex).

Because I need images whenever I try to understand things, I start with a simple image of Reflex: I consider Reflex a watchdog! Reflex is watching something and alarms me or someone else when something happens – a defined condition is met.

Read on for an example of how this works using a real dataset.

Comments closed

Postgres Performance Tuning via work_mem

Published 2023-10-23 by Kevin Feasel

Salman Ahmed explains what working memory is in Postgres and the effects of changing the work_mem value:

PostgreSQL, by default, is configured to run everywhere with minimum resource utilization. To achieve maximum performance under specific scenarios, PostgreSQL’s parameters can be tuned to enhance performance. One such parameter that can impact performance in PostgreSQL is work_mem.

In this blog we will discuss how work_mem can be used to optimize performance in PostgreSQL.

Click through for that discussion.

Comments closed

Debugging SQLPackage Issues in Powershell

Published 2023-10-23 by Kevin Feasel

Jose Manuel Jurado Diaz simplifies SQLPackage output:

Handling massive SQLPackage diagnostic logs, like those spanning over 4 million rows, can be an overwhelming task when troubleshooting support cases. This article introduces a PowerShell script designed to efficiently parse through SQLPackage diagnostic logs, extract error messages, and save them to a separate file, thus simplifying the review process and enhancing the debugging experience.

Click through for a Powershell script that can help.

Comments closed

Building Diagrams in Mermaid

Published 2023-10-23 by Kevin Feasel

Michael Bourgon tries out Mermaid:

Just found out about this the past month.

I like diagrams for my documentation, and I detest making it. I also would like to build it via script, since that’s more useful.

I used Mermaid to create a series of architectural diagrams a couple years back. It was a reasonably good experience, although you have to keep in mind that you don’t get pixel-perfect designs and certain concepts can be difficult to represent. Even so, it’s quite alright for straightforward diagrams and includes support for icon sets for a variety of cloud and on-premises environments.

Comments closed

Caching: In-Database and External

Published 2023-10-23 by Kevin Feasel

Adron Hall talks caches:

All aboard the Data Express! Let’s imagine our database as this massive train station. The trains are packed with information – from passengers’ details to the schedules. Every time you want to know when the next train to DevOps Land is, you have to ask the station master (the database). If too many folks keep asking the same question, the station master will get tired, slowing down the whole operation. So, what do we do? Enter: Caching!

Read on for different caching mechanisms in several major relational databases, various reasons for external caches (like Redis and memcached) to exist, and four patterns for external caching. I’ve found that database people tend not to care much about external caches, leaving that to application developers. But there can be good reasons to store high-read, low-write data in caches, reducing some of the strain on those expensive database servers.

Comments closed

Making a Time Series Stationary in R

Published 2023-10-20 by Kevin Feasel

Steven Sanderson puts a halt to things:

When working with time series data, one common challenge is dealing with non-stationary data. Non-stationary time series can be a headache for analysts, but fear not, because we have a handy tool to make your life easier. Say hello to the auto_stationarize() function from the {healthyR.ts} package.

Read on to learn why you want stationary data for time series analysis and how the auto_stationarize() function works.

Comments closed

Capturing a TCP Dump in an Azure Databricks Notebook

Published 2023-10-20 by Kevin Feasel

Stithi Panigrahi does some troubleshooting:

Due to the potential impact on performance and storage costs, Azure Databricks clusters don’t capture networking logs by default. Follow the below instructions if you need to capture tcpdump to investigate multiple networking issues related to the cluster. These steps will capture a TCP dump on each cluster node–both driver and workers during the entire lifetime of the cluster.

Click through for an initiation script, which generates the actual script, which itself generates the TCP dumps.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Curated SQL Posts