Press "Enter" to skip to content

Day: May 19, 2021

The Value of Bubble Charts

Elizabeth Ricks takes us through a surprisingly tricky chart:

An extension of a scatterplot, a bubble chart is commonly used to visualize relationships between three or more numeric variables.  Each bubble in a chart represents a single data point. The values for each bubble are encoded by 1) its horizontal position on the x-axis, 2) its vertical position on the y-axis, and 3) the size of the bubble. Sometimes, the color of the bubble or its movement in animation can represent more dimensions. 

I say surprisingly tricky because it’s easy to overwhelm the user when trying to view bubble charts. I think the best scenarios are cases in which you have relatively few data points and the size element is mandatory.

Hans Rosling (RIP) did an outstanding job of displaying this kind of chart with the Gapminder dataset.

Comments closed

Row-Level Security in Serverless SQL Pools

Jovan Popovic has a method of implementing poor man’s row-level security in Azure Synapse Analytics serverless SQL pools:

Serverless Synapse SQL pools enable you to read Parquet/CSV files or Cosmos DB collections and return their content as a set of rows. In some scenarios, you would need to ensure that a reader cannot access some rows in the underlying data source. This way, you are limiting the result set that will be returned to the users based on some security rules. In this scenario, called Row-level security, you would like to return a subset of data depending on the reader’s identity or role.

Row-level security is supported in dedicated SQL pools, but it is not supported in serverless pools (you can propose this feature in Azure feedback site). In some cases, you can implement your own custom row-level security rules using standard T-SQL code.

Read on to see how.

Comments closed

New Features in R 4.1.0

The Jumping Rivers folks have some good news for us:

The stability of the base packages is a great strength of the R ecosystem: both underpinning, and contrasting with, the rapid pace at which contributed packages (CRAN, BioConductor) evolve.

Imagine introducing a new feature into the R language. Even if problems arise with the usability of that feature, it would need to be maintained until at least the next major release, by which time thousands of developers and analysts may depend upon it. Unsurprisingly, the R maintainers are exceedingly cautious when introducing new syntax.

Similarly, you should employ caution when using new syntax in your own code. If you do use syntax that was introduced in R-4.1, be aware that your code will not run on versions of R that precede this. For example, this may prevent your new analysis scripts from running on your colleague’s computer, or prevent users from installing your new package.

Given how many third-party packages have regular breaking changes, I do wish more people would follow this advice.

Getting into the meat of things, I really like the F#-style pipe in R: |> makes a lot of intuitive sense, though I do wish they had included a placeholder element with the native pipe.

Comments closed

SSMS Templating

Kenneth Fisher shows off templating in SQL Server Management Studio:

Several times over the last few weeks I’ve gotten a request to create a new work database. The individuals from this team each have their own database that they can use as a type of scratch pad and I guess they’ve been hiring. It’s simple enough to create the database and then grant the necessary permissions, but let’s face it, after the first time I was already tired of the GUI and scripted the process out. Running the script was better but I quickly became annoyed at having to make changes to the script. User names etc. So I decided to create a more permanent script. My first thought was to use dynamic SQL. While that would work, and I’m certainly comfortable with dynamic SQL, it just didn’t feel right for this. I decided in the end to use a trick from templates. If you set up parameter(s) in the file you can use Ctrl-Shift-M to scroll through them and make changes. In an odd twist these are called template parameters

Click through for a demo.

Comments closed

Using Database Projects for Declarative Database Development

Haroon Ashraf explains the principles behind database projects and demonstrates their use:

This article is all about declarative database development using Azure Data Studio for both beginners and professionals who are new to it.

Additionally, some professional life tips in the context of the topic are also shared. The importance of declarative database development over its counterparts can also be fairly understood by going through this article.

Conceptually, I love it. Focusing on the end state is easier to understand. The problem I run into is that the tooling for generating change scripts is not great. It works for trivial database sizes, but as soon as you start talking dozens or hundreds of gigabytes of data, database projects have a tendency to do rather drastic changes which require rebuilding the table, when they could (with a bit of human smarts) perform an action which is much less disruptive. So in the end, you still end up needing to create change scripts.

Comments closed

From Excel to SQL Server via Powershell

Kevin Wilkie combines Excel and dbatools like a mad scientist:

Those ways work great if you’re inserting one tab of an Excel spreadsheet. I don’t know about y’all, but I don’t want to have to do all of this work for each tab – especially if I have 30+ tabs to import into SQL Server (and yes, I’ve been asked to do that quite a few times.)

One of the easiest ways I’ve seen to insert a lot of data (once you’ve made sure that the data is how you want it) is to use the power of PowerShell.

Read on to see how.

Comments closed

Preventing Calendar Overrun in Power BI

Matt Allington updates an older article:

Consider the example below where the CalendarYear is filtered for 2019 and the values of the measures Total Sales and Total Sales YTD are displayed by month. As you can see, the total sales are shown up to July 2019. This is because with the sample data, the last sales date is somewhere in July 2019. However, the values of Total Sales YTD are repeated all the way until the end of year (July 2019 to December 2019). This is what I call Calendar Over Run. It is common to want to prevent this overrun.

Read on for two separate methods of preventing this visual issue.

Comments closed