Pipes And More Pipes In R

Gabriel (de Selding?) has a tutorial on how to use the various pipes in R:

In F#, the pipe-forward operator |> is syntactic sugar for chained method calls. Or, stated more simply, it lets you pass an intermediate result onto the next function.

Remember that “chaining” means that you invoke multiple method calls. As each method returns an object, you can actually allow the calls to be chained together in a single statement, without needing variables to store the intermediate results.

In R, the pipe operator is, as you have already seen, %>%. If you’re not familiar with F#, you can think of this operator as being similar to the +in a ggplot2 statement. Its function is very similar to that one that you have seen of the F# operator: it takes the output of one statement and makes it the input of the next statement. When describing it, you can think of it as a “THEN”.

Auto-recommended for the F# love, and a good tutorial to boot.

John Mount has a few interesting notes on the topic:

Read on for the rest of his notes, too.

Moving From reshape2 To tidyr

Kevin Feasel



Martin Johnsson talks about a couple tricky bits when moving from reshape2 to tidyr:

In practice, I don’t think people always take their data frames all the way to tidy. For example, to make a scatterplot, it is convenient to keep a couple of variables as different columns. The key is that we need to move between different forms rapidly (brain time-rapidly, more than computer time-rapidly, I might add).

And not everything should be organized this way. If you’re a geneticist, genotypes are notoriously inconvenient in normalized form. Better keep that individual by marker matrix.

The first serious piece of R code I wrote for someone else was a function to turn data into long form for plotting. I suspect plotting is often the gateway to tidy data. The function was like what you’d expect from R code written by a beginner who comes from C-style languages: It reinvented the wheel, and I bet it had nested for loops, a bunch of hard bracket indices, and so on. Then I discovered reshape2.

I’d not used reshape2 before, having started with tidyr, so it was interesting to see the contrast.

Data Manipulation In R

Kevin Feasel



Steph Locke has a new book out:

Data Manipulation in R is the second book in my R Fundamentals series that takes folks from no programming knowledge through to an experienced R user. Working with Rfocussed on the very basics and this second book covers data wrangling.

Introducing the core skill of working with tabular data, this book goes from importing data, to analysing it, and then getting it back out for consumption elsewhere. Leaning heavily on the tidyverse, I think it’s an accessible introduction for those new to analysing data.

The book is free if you have Kindle Unlimited; otherwise, it’s not expensive at all.

Python Data Frames In ML Services

Robert Sheldon continues his SQL Server Machine Learning Services series by looking at Python data frames:

This article focuses on using data frames in Python. It is the second article in a series about MLS and Python. The first article introduced you briefly to data frames. This article continues that discussion, describing how to work with data frame objects and the data within those objects.

Data frames and the functions they support are available to MLS and Python through the pandas library. The library is available as a Python module that provides tools for analyzing and manipulating data, including the ability to generate data frame objects and work with data frame data. The pandas library is included by default in MLS, so the functions and data structures available to pandas are ready to use, without having to manually install pandas in the MLS library.

There’s quite a bit to this article, making it an interesting read.

Multi-Object JSON Arrays In SQL Server

Kevin Feasel



Bert Wagner shows how to build JSON arrays in SQL Server:

When using FOR JSON PATH, ALL rows and columns from that result set will get converted to a single JSON string.

This creates a problem if, for example, you want to have a column for your JSON string and a separate column for something like a foreign key (in our case, HomeId). Or if you want to generate multiple JSON strings filtered on a foreign key.

The way I chose to get around this is to use CROSS APPLY with a join back to our Home table — this way we get our JSON string for either Cars or Toys created but then output it along with some additional columns.

Impedance mismatch?  What impedance mismatch?

Parallel CHECKDB

Kevin Feasel



Arun Sirpal shows us when DBCC CHECKDB can go parallel:

DBCC CHECKDB has the ability to perform parallel checking of objects. However, it absolutely depends on the edition of SQL Server, it only happens when using enterprise edition.

Let’s see this in action. I propose the following tests for this blog post:

  • Test on a SQL Server Enterprise Edition.
  • Test on a non-enterprise edition of SQL Server.

I don’t have 2017 Enterprise at hand but I do have 2014 Enterprise and Express handy so it makes sense to use these versions for my “experiment”.

Read on to see the difference.

DataRow To JSON With Powershell

Rob Sewell shows how to convert a .NET DataRow into its JSON form using Powershell:

I wanted to be able to Mock $variable. I wrapped the code above in a function, let’s call it Run-Query

Which meant that I could easily separate it for mocking in my test. I ran the code and investigated the $variable variable to ensure it had what I wanted for my test and then decided to convert it into JSON using ConvertTo-Json

Read on to see the fun mess that ConvertTo-Json made and then Rob’s simplification.

Perceiving Forms: Gestalt Principles

Meagan Longoria describes the basics of the Gestalt school of thought:

Gestalt principles can help us highlight patterns and reduce noise in data visualization. They can also help us create a visual hierarchy and employ symmetry in our designs for a more pleasing user experience.

Some Gestalt principles are very similar to our understanding of preattentive attributes. This is demonstrated in the set of 3 charts below.

Meagan gives a good overview of the concepts and uses a Power BI example to apply them.  There’s some really good advice in here.

Alerting On SQL Server Blocking

Andy Mallon has built a process to e-mail him when there’s excessive blocking:

  1. Only alert if there’s actually a problem that is actionable. Please don’t throw email alerts at me if there’s nothing for me to do. Don’t be the boy who cried wolf. You probably already have an email rule that ignores alerts you don’t care about. Don’t make more spam.

  2. Only alert if the problem is critical. In addition to my #1 requirement that it be actionable, I should have to act now. Email alerts are a cry for help, and aren’t appropriate for every problem. For lower-priority problems, I prefer digest reports, where I can set aside time in my day to work on many lower-priority issues and take care of them all at once.

I cannot agree enough with the cited section—this is so important for alerts, but something we tend not to think about.  That’s why I have tens of thousands of unread alerts in my inbox tagged for auto-deletion (most of which alerts I didn’t create and don’t relate to me at all, but I can’t go breaking somebody else’s workflow).


December 2017
« Nov Jan »