Press "Enter" to skip to content

Month: October 2023

Building a Multi-Tenant Database

Adron Hall looks at multi-tenancy within Postgres:

Music has always been a significant part of my life. From the melodies that accompany my daily routines to the anthems of my most memorable moments, it’s been a constant. As my collection grew, I realized I needed a better way to organize it. That’s when I stumbled upon the concept of multi-tenancy databases and decided to give it a shot with PostgreSQL. Here’s my experience.

Multi-tenancy is one case in which I’m much more relaxed about including the tenant ID on tables where it is not absolutely necessary in order to prevent a series of joins to get the appropriate tenant ID. We can quibble about whether that’s reasonable denormalization or appropriate use of a superkey—especially because, in SQL Server, tenant ID ends up being part of the clustered index and likely part of the primary key anyhow—but it’s extremely useful nonetheless.

Comments closed

CAST() and CONVERT() for Dates

Chad Callihan converts a date:

CAST and CONVERT can both be used to switch a value to a new data type. They are similar, but certainly not identical. While CAST is considered ANSI SQL and will get you across the finish line, CONVERT can give you more flexibility when it comes to formatting date values. Let’s look at an example comparing the usage of CAST and CONVERT with dates.

Most of the time, I’ll use CAST() over CONVERT(), not so much because the former is ANSI compliant, but rather because I think it’s more intuitive to remember. Date formatting is one of the few occasions in which I usually prefer CONVERT() and that’s precisely because of the format options. Of course, if you want more custom formatting options, you can use FORMAT(), though that function uses .NET in the background and is remarkably slow. It’s fine if you’re formatting a few dates, but if you’re outputting millions of rows, you will certainly see a marked difference.

Comments closed

Bubble Charts in ggplot2

Steven Sanderson creates a bubble chart:

Bubble charts are a great way to visualize data with three dimensions. The size of the bubbles represents a third variable, which can be used to show the importance of that variable or to identify relationships between the three variables.

To create a bubble chart in R using ggplot2, you will need to use the geom_point() function. This function will plot points on your chart, and you can use the size aesthetic to control the size of the points.

Click through for two examples, one which is a pretty good outcome for using a bubble chart, and one which exposes the key weakness of bubble charts.

Comments closed

Apache Zookeeper Vulnerability

The Instaclustr team reviews an announcement:

On October 11, 2023, the Apache ZooKeeper™ project announced that a security vulnerability has been identified in Apache ZooKeeper, CVE-2023-44981. The Apache ZooKeeper project has classified the severity of this CVE as critical. The CVSS (Common Vulnerability Scoring System) 3.x severity rating for this vulnerability by the NVD (National Vulnerability Database) is base score 9.1 Critical.  

That’s a rather high base score and is comes about if you have the setting quorum.auth.enableSasl=true. Updating to the Zookeeper 3.7.2 or alter, 3.8.3 or later, or anything in the 3.9 branch will fix this vulnerability.

Comments closed

Diagnosing High CPU in an ASP.NET App with PerfView

Josh Darnell goes hunting for the problem:

A colleague reached out to me while they were doing some routine maintenance on a web server. One of our app pools was sitting at around 25-35% CPU usage, despite the fact that no one was using this particular application. They intended to restart the app pool, but asked if I wanted to check anything first.

Read on to see what Josh did to troubleshoot and then correct the issue.

Comments closed

Finding Power BI Report Visual IDs

Chris Webb goes digging for IDs:

Back in 2021 I wrote a post showing how you can link a DAX query generated by a Power BI report in Log Analytics to a visual in a Power BI report. In that post I showed how you can get the unique IDs for each visual in a report using Performance Analyzer, an approach which works but has drawbacks. Then, in 2022 I wrote another post showing how you can get the same IDs using the Power BI Embedded Analytics Playground, which again works but isn’t perfect. After that, this August, my colleague Rui Romano pinged me on Teams to point out that the new Power BI Desktop Developer Mode and the .pbip file format provides the best and easiest way to get these IDs.

Read on to learn more, but also check out Mike Rudzinski’s comment for a fourth technique.

Comments closed

Minimizing Callback Counts in SUMX()

Marco Russo and Alberto Ferrari speed things up a bit:

Pushing calculations down to the VertiPaq storage engine is always a good practice. Sometimes this is not feasible. However, carefully analyzing the aggregated expression can lead to optimization ideas that produce excellent query plans.

DAX developers should not be scared of iterators. Their performance is great as long as the expression computed during the iteration can be pushed down to the VertiPaq storage engine. 

Read on to understand what they mean by callback and the enormous performance cost you’ll want to avoid.

Comments closed

SQL Server Data Import and Export via File

Ed Pollack opens an import-export business:

For the purposes of this article, we will focus solely on the task of moving a data set from one server to another. Topics such as ETL, ELT, data warehousing, data lakes, etc…are important and relevant to data movement, but out of scope for a focused discussion such as this.

Ed touches on why you might want to use files and then shares his recommendations for generating files from SQL Server data as well as importing data from flat files into SQL Server.

Comments closed

Creating Pareto Charts in R with qcc

Steven Sanderson builds a Pareto chart:

A Pareto chart is a type of bar chart that shows the frequency of different categories in a dataset, ordered by frequency from highest to lowest. It is often used to identify the most common problems or causes of a problem, so that resources can be focused on addressing them.

To create a Pareto chart in R, we can use the qcc package. The qcc package provides a number of functions for quality control, including the pareto.chart() function for creating Pareto charts.

Manufacturing companies love Pareto charts

Comments closed

Several Useful R Functions

Maelle Salmon shows off four useful R functions:

Recently I caught myself using which(grepl(...)),

animals <- c("cat", "bird", "dog", "fish")
which(grepl("i", animals))
#> [1] 2 4

when the simpler alternative is

animals <- c("cat", "bird", "dog", "fish")
grep("i", animals)
#> [1] 2 4

Read on for another example of using grep() instead of grepl(), as well as three other functions you might want to keep in mind. H/T R-Bloggers.

Comments closed