2022-05-02 – Curated SQL

Comparing R Package Versions with Diffify

Published 2022-05-02 by Kevin Feasel

Clarissa Barratt and Parisa Gregg announce an interesting tool:

You know that sinking feeling that you get when you’re months into a big project and you log in one day and nothing works? Turns out something has updated and things have been removed that you needed and now you need to spend hours-days figuring out what’s changed and your masters deadline is getting closer and … ok, apparently this took me back to a very specific event.
But I’m sure most of that sounds familiar to you if you’ve ever programmed something over a longer period of time.
Over the last few months, Jumping Rivers have been working on a tool that will make it easier to see differences between R package versions: Diffify.

Read on to see it in action. It looks quite useful for troubleshooting issues in which a package suddenly changed API functionality, something which tends to happen frequently in the R and Python worlds.

Comments closed

Playing with gganimate

Published 2022-05-02 by Kevin Feasel

Tomaz Kastrun tries out gganimate:

I firmly believe that animation and transition between different data states can give end-users much better insights and understanding of the data, than a single table with data points or correlation metrics.
With help of ggplot, gganimate, you can quickly create an animation based on your needs. This is a simple IRIS dataset example.

You can find more at the gganimate website. The real downside is that I don’t think it’s being maintained any longer, as the last commit was a year ago.

Comments closed

Slow File Open Times in Power BI

Published 2022-05-02 by Kevin Feasel

Marco Russo explains why opening some Power BI files might take so long:

There could be many reasons for that, but if you have calculated columns and/or calculated tables in your model, you should be aware that they could be the reasons why this happens. It could be, so I want to explain when this happens.
The short explanation is the following: when you open a PBIX file, Power BI Desktop automatically recalculates those calculated columns and calculated tables that depend on a volatile formula.

Read on for the longer explanation, which includes a (possibly incomplete) list of volatile formulas.

Comments closed

High Availability in SQL Managed Instance General Purpose Tier

Published 2022-05-02 by Kevin Feasel

Niko Neugebauer clears up what options you have for high availability in SQL MI’s General Purpose tier:

The two main requirements around high availability are commonly known as RTO and RPO.

RTO – stands for Recovery Time Objective and is the maximum allowable downtime when a failure occurs. In other words, how much time it takes for your databases to be up and running.

RPO – stands for Recovery Point Objective and is the maximum allowable data-loss when a failure occurs. Of course, the ideal scenario is not to lose any data, but a more realistic (and also ideal) scenario is to not lose any committed data, also known as Zero Committed Data Loss.

With those definitions out of the way, read on to learn more.

Comments closed

Sharing Individual Power BI Dataflows

Published 2022-05-02 by Kevin Feasel

Marc Lelijveld is in a sharing mood:

Recently, I have had a challenge at a customer, where a central teams maintains many dataflows in Power BI, to store their only and single version of the truth. However, this central team maintained many different dataflows in a single workspace, but did not want to share the entire workspace with others. What now? How can they share a single dataflows in Power BI?
In this blog, I will describe different ways to share dataflows in the Power BI service and highlight pros and cons of each solution. Read on to find out what options you have, and what my personal preference would be.

Read on to learn why you might want to share a dataflow, as well as four techniques to do it.

Comments closed

The Performance Cost of Subqueries in the SELECT Clause

Published 2022-05-02 by Kevin Feasel

Andrea Allred notes a potential performance issue:

Why is this query slow? If this query were to return 50 rows, it would run each query in the SELECT clause 50 times, and since there are two of them, that is 100 query runs. What if I returned 100,000 rows? That would be 200,000 query runs. How could I do this differently?

Read on for the answer.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Day: May 2, 2022

Comparing R Package Versions with Diffify

Playing with gganimate

Slow File Open Times in Power BI

High Availability in SQL Managed Instance General Purpose Tier

Sharing Individual Power BI Dataflows

The Performance Cost of Subqueries in the SELECT Clause