Press "Enter" to skip to content

Author: Kevin Feasel

Checkpointing Code For Reproduction

David Smith tells an interesting story about a reproducibility problem with data analysis:

Timo Grossenbacher, data journalist with Swiss Radio and TV in Zurich, had a bit of a surprise when he attempted to recreate the results of one of the R Markdown scripts published by SRF Data to accompany their data journalism story about vested interests of Swiss members of parliament. Upon re-running the analysis in R last week, Timo was surprised when the results differed from those published in August 2015. There was no change to the R scripts or data in the intervening two-year period, so what caused the results to be different?

The version of R Timo was using had been updated, but that wasn’t the root cause of the problem. What had also changed was the version of the dplyr package used by the script: version 0.5.0 now, versus version 0.4.2 then. For some unknown reason, a change in the dplyr package in the intervening package caused some data rows (shown in red above) to be deleted during the data preparation process, and so the results changed.

Click through for the solution, which is pretty easy in R.

Comments closed

Memory-Optimized Object Isolation Levels

Ned Otter looks at the isolation levels offered when you work with memory-optimized objects:

If you are only querying on-disk tables, you can use any of the isolations levels from List 1. And if you are only querying memory-optimized tables, you can use any of the isolation levels from List 2.

But what if you want to reference both on-disk and memory-optimized tables in the same query? Of course, the answer is “it depends”, with transaction initiation modes and isolation levels being the components of that dependency.

This post is part one of a series and is mostly around level-setting.

Comments closed

What Is DAX?

Matt Allington covers some of the basics of DAX:

Do I need to learn the DAX language?

You certainly do not need to know how to write DAX to get started with Power BI.  Power BI is the newest business intelligence tool that leverages the DAX language (via Power Pivot) and it is definitely possible to get started and build some reports without learning any DAX at all.   If you are a “consumer of reports” that other people produce for you then you certainly don’t need to learn any DAX.  However if you are someone that wants to do your own adhoc (or structured) analysis of data using Power BI, Power Pivot for Excel, then you will definitely want to learn to write some DAX in order to get value from what these new tools have to offer.

It’s a good intro if you aren’t familiar with DAX.

Comments closed

Thoughts On CLR Strict Security

Solomon Rutzky has started a series on CLR in SQL Server 2017 and lays down a gauntlet:

What all of that means is that, assuming clr strict security is “1” (i.e. enabled), and TRUSTWORTHY is “OFF” for the Database in which an Assembly is being created, then in order to create any Assembly you first need to:

  1. Sign the Assembly with a strong-name key or a certificate
  2. Create an Asymmetric Key or Certificate in master from whatever you signed the Assembly with
  3. Create a Login based on that Asymmetric Key or Certificate
  4. Grant that Login the UNSAFE ASSEMBLY permission

Is that really so bad? Aren’t many of us (hopefully!) already doing that?

Solomon’s not very happy with the way that CLR security works in 2017, but he does have solutions of his own in mind.

Comments closed

CROSS APPLY Replacing REPLACE

Bert Wagner shows off a good use of the APPLY operator:

Here we only have 4 nested REPLACE functions. My shameful record is 29. I’m not proud of it, but sometimes it’s the only way to get things done.

Not only are these nested REPLACE() functions difficult to write, but they are difficult to read too.

Instead of suffering through all of that ugly nesting, what you can do instead is use CROSS APPLY:

Click through for the example.  This is one of several great uses for the APPLY operator.

Comments closed

PSSDIAG On Linux

Denzil Ribeiro shows how to use PSSDIAG on a SQL Server on Linux installation:

When analyzing SQL Server performance related issues, customers often have their tools of choice, which can be a feature within the product, a third-party performance monitoring tool, or a home-grown tool that assists in monitoring live performance. For live monitoring, in the SQLCAT lab we use a home grown tool described in this blog. However, when our customers have a performance issue, we, just like support engineers and consultants, can’t always have them ship their third-party tools or associated data, and hence need a way to collect performance related data for post mortem analysis.

PSSDIAG is a popular tool used by Microsoft SQL Server support engineers to collect system data and troubleshoot performance issues. This is a well-known tool for SQL Server on Windows, and we needed equivalent functionality on Linux. PSSDIAG data collection for Linux is now available here. It is a set of bash scripts that collect all the necessary data for troubleshooting performance problems, similar to PSSDiag on Windows.

I haven’t used PSSDIAG outside of a support scenario, but it’s definitely good to know that this is available on Linux.

Comments closed

Recursion In Python

Mike Driscoll shows how to create recursive functions in Python:

Recursion is a topic in mathematics and computer science. In computer programming languages, the term recursion refers to a function that calls itself. Another way of putting it would be a function definition that includes the function itself in its definition. One of the first warnings I received when my computer science professor talked about recursion was that you can accidentally create an infinite loop that will make your application hang. This can happen because when you use recursion, your function may end up invoking itself infinitely. So, as with any other potential infinite loop, you need to make sure you have a way to break out of the loop. The idea in most recursive functions is to break up the procedure being done into smaller pieces that we can still process with the same function.

Read on for a couple quick recursion scenarios.

Comments closed

Basic Powershell Regex

Adam Bertram shows how to use regular expressions for pattern matching in Powershell:

However, regex has traditionally been a topic that most IT pros shy away from when they first see how it works. Admittedly, regex does take a bit of getting used to, and you’re still probably going to have to do some Googling every time you need to use it. But learning how PowerShell integrates regex into its language is a skill that’s much easier learned and one that will come in handy often.

First of all, when someone says “working with regex,” that can mean a lot of things so let’s break it down a little bit. Regex is a method of string matching. Its sole purpose is to match and parse strings from within other strings. This can mean a lot of things and PowerShell allows you to do just about anything here, particularly as it has the full power of the .NET Framework. For our purposes, let’s investigate a few ways PowerShell allows you to match strings and how to parse strings with regex and PowerShell.

Read on for a couple sample scenarios.

Comments closed

When Database Restoration Leaves The Source In Recovery

David Fowler shows how you can restore a database to a new database and leave the original in recovery mode:

If you ever restore a backup to a new database, there’s something that you should probably be aware of otherwise you could easily find yourself in this situation.

Let’s have a look at what happens when we try to restore a copy of the SQLUndercover database using SSMS.  We’re going to kick this off by right clicking on ‘SQLUndercover’ and selecting restore database.

Read this to avoid a panic attack.

Comments closed

Row-Level Security In Power BI

Paul Turley has a video showing how to use row-level security with Power BI:

The best method to implement row-level security in a published Power BI model or SSAS Tabular model consumed from the Power BI service will depend on will depend on your data and requirements.  The method I demonstrate here is one of the most flexible approaches and one that I commonly use in my projects.

Click through to watch the video.

Comments closed