Dependencies as Risks

Kevin Feasel

2019-03-18

R

John Mount makes the point that packages dependencies are innately a risk:

If your software or research depends on many complex and changing packages, you have no way to establish your work is correct. This is because to establish the correctness of your work, you would need to also establish the correctness of all of the dependencies. This is worse than having non-reproducible research, as your work may have in fact been wrong even the first time.

Low dependencies and low complexity dependencies can also be wrong, but in this case there at least exists the possibility of checking things or running down and fixing issues.

There are some insightful comments on this post as well, so check those out. This is definitely an area where there are trade-offs, so trying to reason through when to move in which direction is important.

Custom ggplot2 Fonts

Daniel Oehm shares two techniques for using custom fonts in your ggplot2 visuals:

ggplot – You can spot one from a mile away, which is great! And when you do it’s a silent fist bump. But sometimes you want more than the standard theme.

Fonts can breathe new life into your plots, helping to match the theme of your presentation, poster or report. This is always a second thought for me and need to work out how to do it again, hence the post.

There are two main packages for managing fonts – extrafont, and showtext.

Read on to see how to use each of these packages. H/T R-bloggers

Sending Highlighted Code From VS Code Via SSH

Anthony Nocentino shows how you can use Visual Studio Code to highlight and then send code via SSH to a remote machine:

You can create a custom keyboard shortcut in VS Code (And Azure Data Studio too) that gives you this functionality. Highlight code, press a button and execute that code in the active terminal, which just so happens to be SSH’d into a remote host.

Head over to Preferences->Keyboard Shortcuts (Picture 1) and in there you’ll find a shortcut called “Terminal: Run Selected Text In Active Terminal” (Picture 2). This is exactly what I want. Now, when I’m presenting…I can highlight the code…and what I highlighted gets copied into the terminal below and executed on whatever system is active in the terminal below. This could be either my local computer or a remote system over SSH.

Anthony’s use case is specifically around presentations but it could also be good for general use.

SSIS on Windows Containers

Andy Leonard is a man who doesn’t like to take “no” for an answer:

Seriously, since I hopped the fence from developer to data I’ve dreamed of the day when I could practice lifecycle management with data-stuff like I used to practice lifecycle management with software development.
I recognize the obstacles. The greatest obstacle (in my humble opinion) is software is mostly stateless these days (these days started with Object-Oriented Programming and include its descendants).

Stateless development solves lots of engineering problems in lifecycle management, and by “solves a lot of engineering problems” I mean some engineering problems simply don’t exist so lifecycle management for stateless stuff can simply ignore a “lot of engineering problems.”

A database, on the other hand, is all about that state. When it comes to managing lifecycle for a stateful platform – like a database – ACID gets dumped on many lifecycle management tools and solutions (see what I did there?).

Is it possible to manage a data-related lifecycle using stateless tools? Yes. But here there be obstacles. Let’s look at on use case:

Click through for more thoughts and setup for a new series.

Lock Promotion

Erik Darling tries to figure out why his locks can’t get ahead in the rat race:

The first thing I found is that there were 16 attempts at promotion, and four successful promotions.
Why did this seem weird? I dunno.
Why would there be only 4 successful attempts with no competing locks from other queries?
Why wouldn’t all 16 get promotions?

Find out the answer to this and much, much more if you click the link.

Retaining a Few Tables From a Large Set

Jana Sattainathan has a Powershell-based solution to eliminate all but a few tables in a database:

Recently, I received a request to backup a dozen tables or so tables out of 12 thousand tables. I had to retain all the indexes, statistics etc. The goal was to hand this over to the vendor for analysis as a database backup.

I could have copied the selected tables over to a new database using the PowerShell function I had published earlier and backed that up but since the tables to backup were quite large, I skipped that route

Read on to see Jana’s solution.

Flattening Dimensional Models

Reza Rad explains why it makes sense to build flat dimensional models, particularly for Power BI:

The article that I wrote earlier this week about the shared dimension had a lot of interest, and I’m glad it helped many of you. So I thought better to write about the basics of modeling even more. In this article, I will be focusing on a scenario that you have all faced, however, took different approaches. Is it good to have too many dimension tables? can you combine some of those tables together to build one flatten dimension table? how much should you flatten it? should you end up with one huge table including everything? In this article, I’m answering all of these questions and explaining the scenarios of combining dimensions, as usual, I explain the model in Power BI. However, the concepts are applicable to any other tools. If you like to learn more about Power BI; read Power BI book from Rookie to Rock Star.

Given how closely the ideal Power BI data model matches the Kimball model, Reza’s advice makes perfect sense.

Categories

March 2019
MTWTFSS
« Feb Apr »
 123
45678910
11121314151617
18192021222324
25262728293031