Press "Enter" to skip to content

Curated SQL Posts

Bubble Charts in ggplot2

Steven Sanderson creates a bubble chart:

Bubble charts are a great way to visualize data with three dimensions. The size of the bubbles represents a third variable, which can be used to show the importance of that variable or to identify relationships between the three variables.

To create a bubble chart in R using ggplot2, you will need to use the geom_point() function. This function will plot points on your chart, and you can use the size aesthetic to control the size of the points.

Click through for two examples, one which is a pretty good outcome for using a bubble chart, and one which exposes the key weakness of bubble charts.

Comments closed

Apache Zookeeper Vulnerability

The Instaclustr team reviews an announcement:

On October 11, 2023, the Apache ZooKeeper™ project announced that a security vulnerability has been identified in Apache ZooKeeper, CVE-2023-44981. The Apache ZooKeeper project has classified the severity of this CVE as critical. The CVSS (Common Vulnerability Scoring System) 3.x severity rating for this vulnerability by the NVD (National Vulnerability Database) is base score 9.1 Critical.  

That’s a rather high base score and is comes about if you have the setting quorum.auth.enableSasl=true. Updating to the Zookeeper 3.7.2 or alter, 3.8.3 or later, or anything in the 3.9 branch will fix this vulnerability.

Comments closed

Diagnosing High CPU in an ASP.NET App with PerfView

Josh Darnell goes hunting for the problem:

A colleague reached out to me while they were doing some routine maintenance on a web server. One of our app pools was sitting at around 25-35% CPU usage, despite the fact that no one was using this particular application. They intended to restart the app pool, but asked if I wanted to check anything first.

Read on to see what Josh did to troubleshoot and then correct the issue.

Comments closed

Finding Power BI Report Visual IDs

Chris Webb goes digging for IDs:

Back in 2021 I wrote a post showing how you can link a DAX query generated by a Power BI report in Log Analytics to a visual in a Power BI report. In that post I showed how you can get the unique IDs for each visual in a report using Performance Analyzer, an approach which works but has drawbacks. Then, in 2022 I wrote another post showing how you can get the same IDs using the Power BI Embedded Analytics Playground, which again works but isn’t perfect. After that, this August, my colleague Rui Romano pinged me on Teams to point out that the new Power BI Desktop Developer Mode and the .pbip file format provides the best and easiest way to get these IDs.

Read on to learn more, but also check out Mike Rudzinski’s comment for a fourth technique.

Comments closed

Minimizing Callback Counts in SUMX()

Marco Russo and Alberto Ferrari speed things up a bit:

Pushing calculations down to the VertiPaq storage engine is always a good practice. Sometimes this is not feasible. However, carefully analyzing the aggregated expression can lead to optimization ideas that produce excellent query plans.

DAX developers should not be scared of iterators. Their performance is great as long as the expression computed during the iteration can be pushed down to the VertiPaq storage engine. 

Read on to understand what they mean by callback and the enormous performance cost you’ll want to avoid.

Comments closed

SQL Server Data Import and Export via File

Ed Pollack opens an import-export business:

For the purposes of this article, we will focus solely on the task of moving a data set from one server to another. Topics such as ETL, ELT, data warehousing, data lakes, etc…are important and relevant to data movement, but out of scope for a focused discussion such as this.

Ed touches on why you might want to use files and then shares his recommendations for generating files from SQL Server data as well as importing data from flat files into SQL Server.

Comments closed

Creating Pareto Charts in R with qcc

Steven Sanderson builds a Pareto chart:

A Pareto chart is a type of bar chart that shows the frequency of different categories in a dataset, ordered by frequency from highest to lowest. It is often used to identify the most common problems or causes of a problem, so that resources can be focused on addressing them.

To create a Pareto chart in R, we can use the qcc package. The qcc package provides a number of functions for quality control, including the pareto.chart() function for creating Pareto charts.

Manufacturing companies love Pareto charts

Comments closed

Several Useful R Functions

Maelle Salmon shows off four useful R functions:

Recently I caught myself using which(grepl(...)),

animals <- c("cat", "bird", "dog", "fish")
which(grepl("i", animals))
#> [1] 2 4

when the simpler alternative is

animals <- c("cat", "bird", "dog", "fish")
grep("i", animals)
#> [1] 2 4

Read on for another example of using grep() instead of grepl(), as well as three other functions you might want to keep in mind. H/T R-Bloggers.

Comments closed

Exploring Poker Hands in R

Benjamin Smith sorts and deals:

Recently, I have been reading “Mathematical Statistics” by Professor Keith Knight and I noticed a interesting passage he mentions when discussing finite sample spaces:

*In some cases, it may be possible to enumerate all possible outcomes, but in general such enumeration is physically impossible; for example, enumerating all possible 5 card poker hands dealt from a deck of 52 cards would take several months under the most
favourable conditions. * (Knight 2000)

While this quote is taken out of context, with the advent of modern computing this is a task which is definitely possible to do computationally!

Click through to see how you can do this in R, at least for 5-card stud. 5-card draw would have the same number of final combinations, though if you also track intermediary combinations, it would grow rather considerably.

Comments closed

Microsoft Fabric’s Reflex as Watchdog

Tom Martens brings home a junkyard dog:

Reflex is many things next to one of the workloads of Microsoft Fabric. Before I delve into these things in more detail in later articles (yes, maybe this is the birth of another series of articles), I want to say this: Reflex is cool. It was never that simple to watch your data in your Power BI datasets (and this is only one of the capabilities of Reflex).

Because I need images whenever I try to understand things, I start with a simple image of Reflex: I consider Reflex a watchdog! Reflex is watching something and alarms me or someone else when something happens – a defined condition is met.

Read on for an example of how this works using a real dataset.

Comments closed