Press "Enter" to skip to content

Day: March 17, 2021

knitr Options and Hooks

The folks at Jumping Rivers conclude a series:

As with many aspects of programming, when you are working by yourself you can be (slightly) more lax with styles and set-up. However, as you start working in a team, different styles can quickly become a hindrance and lead to errors.

Using {knitr} is no different. When you work on documents with different team members, it’s helpful to have a consistent set of settings. If the default for eval changes, this can easily waste time as you try to track down an error. At Jumping Rivers, we use {knitr} a lot. From our training courses, to providing feedback to clients, to constructing monthly reports on clients infrastructure. The great thing about {knitr} is it’s really easy to customise. The bad thing is that without some care, it’s really easy for every member of the team to have different default options. This proliferation of different default options, means that when we pick up someone else document, mistakes may creep in.

Read on for different options they use to keep things consistent.

Comments closed

Understanding Interpolation

Joe Celko offers up some thoughts on interpolation:

Interpolation is a mathematical technique which was popular before we had a lot of cheap computing power. The basic idea is that if you’re given a set of data and looking for a value in the same range, you can interpolate it to get a reasonable estimation for the value that is not actually in the set.

If you can find an old calculus, finance, statistics or algebra book, they had lookup tables in the back. Remember that the only computational tools students had back then were pencil and paper or a slide ruler. If you wanted to use a pencil and paper, you had to know what formula to use. If you use the slide ruler, you can only have three decimal places in your answer (yes, there were a couple of over-sized specialized slide rulers which could go as high as four or five decimal places. They were very expensive). But if your slide ruler didn’t have a particular function you were trying to compute, it was hard to get even the three decimal places.

When you try to approximate a value outside the range of your set, that’s called extrapolation. It’s a different topic and requires a slight leap of faith.

Interpolation is a key part of regression techniques. Read the whole thing.

Comments closed

Common Power BI Administration Scripts

Brett Powell continues a series:

Today’s post builds on top of Power BI Admin Scripts Part I by describing five additional PowerShell scripts that Power BI service administrators can utilize to address relatively common scenarios. Like Part I, the five new scripts have been added to my GitHub repository and I’ll only share context in the blog.

Please be sure to read through the prerequisites section of Part I and confirm you have the necessary permissions (e.g. Power BI Admin role) and software installed such as the latest Power BI Management PowerShell modules.

Check it out for five more scripts. Brett also teases a part 3, but you have to wait until after the credits sequence to see it.

Comments closed

The Value of Foreign Keys

Kevin Wilkie tempts Betteridge’s Law of Headlines:

Foreign Keys are truly columns that point to the primary key of another table. Yes, you can create a foreign key pointing to the same table – but that’s… well… special. There are reasons, but far beyond the reasons I want to go into foreign keys here.

What a foreign key does for you is extremely important – Enforce Data Consistency.

Click through to learn how.

Comments closed

Reading Extended Events Files with Powershell

Emanuele Meazzo shows how to work with Extended Event *.xel files:

However, I’ve found myself in a tricky spot, as I had multiple instances recording events, and those events had to be analyzed as a whole.
I could have simply written a script to get the data from each instance, querying the system function sys.fn_xe_file_target_read_file and then uploading it somewhere else, but this approach has the issue of adding additional load on the source instances, and I didn’t want to add additional load to the already overworking instances I was trying to monitor.

I then wondered, can I move the files over from the busy servers and read them from another machine? I surely didn’t want to open each file manually and exporting it to a table and/or CSV in order to mash them together, too

Read on to see how.

Comments closed

COUNT(*) vs COUNT(1)

David Alcock tries an experiment:

The question whether to use COUNT(*) or COUNT(1) has been doing the rounds on the internet again, the question this time is whether or not COUNT(*) forces the optimiser to scan all rows or not.

I’m going to query Sales.Orders table of the WideWorldImporters sample database and use a where clause that should take advantage of a non-clustered index on CustomerID. 

David looks at COUNT(*) versus COUNT(1). The story’s slightly different with COUNT(ColumnName) when there are NULL values afoot.

Comments closed

Avoid the MERGE Operator in T-SQL

Cyndi Johnson has a warning for us:

The bugs alone are reasons enough to avoid using it, and also to refactor any existing MERGE statements you have in your code base. Those bugs include ones that have performance implications as well as ones that could lead to incorrect results. Long story short, Microsoft refuses to fix several of them, and some of the other major issues are “by design”!

If you still feel the desire to use them, please keep reading, and I hope by the end of this blog you will be convinced that

1. Less statements does not always lead to better code or better performance.
2. MERGE is like a VAMPIRE and once it is in your codebase, it’s just a matter of time before it drains your servers of their lifeblood.

The worst part about the second point is all the sparkling.

The worst part about the first point is that I was really excited with MERGE came out in 2008, as I wanted UPSERT for SQL Server. It, uh, didn’t work out so well.

Comments closed