Press "Enter" to skip to content

Curated SQL Posts

Comparing R Package Versions with Diffify

Clarissa Barratt and Parisa Gregg announce an interesting tool:

You know that sinking feeling that you get when you’re months into a big project and you log in one day and nothing works? Turns out something has updated and things have been removed that you needed and now you need to spend hours-days figuring out what’s changed and your masters deadline is getting closer and … ok, apparently this took me back to a very specific event.

But I’m sure most of that sounds familiar to you if you’ve ever programmed something over a longer period of time.

Over the last few months, Jumping Rivers have been working on a tool that will make it easier to see differences between R package versions: Diffify.

Read on to see it in action. It looks quite useful for troubleshooting issues in which a package suddenly changed API functionality, something which tends to happen frequently in the R and Python worlds.

Comments closed

Playing with gganimate

Tomaz Kastrun tries out gganimate:

I firmly believe that animation and transition between different data states can give end-users much better insights and understanding of the data, than a single table with data points or correlation metrics.

With help of ggplot, gganimate, you can quickly create an animation based on your needs. This is a simple IRIS dataset example.

You can find more at the gganimate website. The real downside is that I don’t think it’s being maintained any longer, as the last commit was a year ago.

Comments closed

Slow File Open Times in Power BI

Marco Russo explains why opening some Power BI files might take so long:

There could be many reasons for that, but if you have calculated columns and/or calculated tables in your model, you should be aware that they could be the reasons why this happens. It could be, so I want to explain when this happens.

The short explanation is the following: when you open a PBIX file, Power BI Desktop automatically recalculates those calculated columns and calculated tables that depend on a volatile formula.

Read on for the longer explanation, which includes a (possibly incomplete) list of volatile formulas.

Comments closed

High Availability in SQL Managed Instance General Purpose Tier

Niko Neugebauer clears up what options you have for high availability in SQL MI’s General Purpose tier:

The two main requirements around high availability are commonly known as RTO and RPO.


RTO
 – stands for Recovery Time Objective and is the maximum allowable downtime when a failure occurs. In other words, how much time it takes for your databases to be up and running.


RPO
 – stands for Recovery Point Objective and is the maximum allowable data-loss when a failure occurs. Of course, the ideal scenario is not to lose any data, but a more realistic (and also ideal) scenario is to not lose any committed data, also known as Zero Committed Data Loss.

With those definitions out of the way, read on to learn more.

Comments closed

Sharing Individual Power BI Dataflows

Marc Lelijveld is in a sharing mood:

Recently, I have had a challenge at a customer, where a central teams maintains many dataflows in Power BI, to store their only and single version of the truth. However, this central team maintained many different dataflows in a single workspace, but did not want to share the entire workspace with others. What now? How can they share a single dataflows in Power BI?

In this blog, I will describe different ways to share dataflows in the Power BI service and highlight pros and cons of each solution. Read on to find out what options you have, and what my personal preference would be.

Read on to learn why you might want to share a dataflow, as well as four techniques to do it.

Comments closed

S3 and Redshift Data Movement with Role Chaining

Sudipta Mitra, et al, talk AWS security:

This post presents an approach that you can apply at scale to achieve fine-grained access controls to resources in S3 buckets and Amazon Redshift schemas for tenants, including groups of users belonging to the same business unit down to the individual user level. This solution provides tenant isolation and data security. In this approach, we use the bridge model to store data and control access for each tenant at the individual schema level in the same Amazon Redshift database. We utilize ASSUMEROLE and role chaining to provide fine-grained access control when data is being copied and unloaded between Amazon Redshift and Amazon S3, so the data flows within each tenant’s namespace. Role chaining also streamlines the new tenant onboarding process.

Read on for an overview and tutorial.

Comments closed

Quantifying Model Uncertainty with Tensorflow Probability

Vini Jaiswal reviews the Tensorflow Probability library:

In this blog, we look at the topic of uncertainty quantification for machine learning and deep learning. By no means is this a new subject, but the introduction of tools such as Tensorflow Probability and Pyro have made it easy to perform probabilistic modeling to streamline uncertainty calculations. Consider the scenario in which we predict the value of an asset like a house, based on a number of features, to drive purchasing decisions. Wouldn’t it be beneficial to know how certain we are of these predicted prices? Tensorflow Probability allows you to use the familiar Tensorflow syntax and methodology but adds the ability to work with distributions. In this introductory post, we leave the priors and the Bayesian treatment behind and opt for a simpler probabilistic treatment to illustrate the basic principles. We use the likelihood principle to illustrate how an uncertainty measure can be obtained along with predicted values by applying them to a deep learning regression problem.

Read on for an interesting explanation and tutorial.

Comments closed

Exception Handling in Scala

Pallav Gupta shows several methods for handling errors using Scala:

Error handling is the process of handling the possibility of failure. For example, failing to read a file and then continuing to use that bad input would clearly be problematic. Noticing and explicitly managing these errors saves the rest of the program from various pitfalls.

When an exception occurs, say an Arithmetic Exception then the current operation is aborted. Then the runtime system looks for an exception handler that can accept an Arithmetic Exception. Control resumes with the innermost handler. If no such handler exists, the program terminates.

Pallav starts with the most expensive option and ends with the best option with the Either monad.

Comments closed

Preventing Accidental Runs of Powershell Scripts

Jana Sattainathan has us hold our horses:

You might want to read this to the end! You are into PowerShell and have a ton of scripts. The problem is some of them run things that are not meant to be run at any time by anyone except you when you first set it up  – like dropping the entire contents of a folder and sub-folders to reinitialize.

Once you realize that you accidentally hit F5 on your “DropFolderContents.ps1” script, it is already too late. The damage is done.

Click through for several techniques to prevent this.

Comments closed