Literate Programming And Notebooks

David Smith sums up a debate on notebooks versus literate programming:

There’s no video yet available of Joel’s talk, but you can guess the theme of that opening slide, and walking through the slides conveys the message well, I think. Yuhui Xie, author and creator of the rmarkdown package, provides a detailed summary and response to Joel’s talk, where he lists Joel’s main critiques of Notebooks:

  1. Hidden state and out-of-order execution

  2. Notebooks are difficult for beginners

  3. Notebooks encourage bad habits

  4. Notebooks discourage modularity and testing

  5. Jupyter’s autocomplete, linting, and way of looking up the help are awkward

  6. Notebooks encourage bad processes

  7. Notebooks hinder reproducible + extensible science

  8. Notebooks make it hard to copy and paste into Slack/Github issues

  9. Errors will always halt execution

  10. Notebooks make it easy to teach poorly

  11. Notebooks make it hard to teach well

Read the whole thing.  I agree with some of these points, but disagree with a few on the list.

Mutating Data Frames Without dplyr

John Mount points out that there is a built-in function to mutate data frames in R:

The notation we used above is the “explicit argument” variation we recommend for readability. What a lot of dplyr users do not seem to know is: base-R already has this functionality. The function is called transform().

To demonstrate this, let’s first detach dplyr to show that we are not using functions from dplyr.

detach("package:dplyr", unload = TRUE)

Now let’s write the equivalent pipeline using exclusively base-R.

Click through for the way to do this as a pipeline operation.

When Secondary Nodes Affect Primaries

Dmitri Korotkevitch shows that a readable secondary in an Availability Group can negatively impact the primary:

A long time ago in a galaxy far, far away, I had to troubleshoot interesting performance issue in SQL Server. Suddenly, the CPU load on the server started to climb up. Nothing changed in terms of workload. The system still processed the same amount of requests. The execution plans of the critical queries stayed the same. Nevertheless, the CPU usage grew up slowly and steadily by a few percent per hour.

Eventually, we nailed it down. The problem occured in very busy OLTP system with very volatile data. We noticed that system performed much more I/O (logical and physical) than before. It was very strange, because nothing should have changed that day. Finally, we found that we have large number of deleted rows in the database that had not been cleaned up by ghost cleanup task.

Read on to learn what caused this mess.

Issues With Bulk Inserting Multi-Byte Characters In Fixed Width Files

Randolph West shares an example of an issue with BULK INSERT:

Fellow Canadian Doran Douglas brought this issue to my attention recently, and I wanted to share it with you as well.

Let’s say you have a file in UTF-8 format. What this means is that some of the characters will be single-byte, and some may be more than that.

Where this becomes problematic is that a fixed-width file has fields that are, well, fixed in size. If a Unicode character requires more than one byte, it’s going to cry havoc and let slip the dogs of truncation.

Click through for an example.  This seems like a bug to me—I interpret fixed-width as fixed number of characters, not fixed number of bytes.  At the very least, it’s liable to cause confusion.

Trigger Spirals

David Fowler tells a story of woe, one which is totally not his fault:

To do this, a trigger was created which would send all the details via a Service Broker message to another SQL Server, this SQL Server was used to hold details of the AD accounts and from there, changes were automatically propagated out to AD.

This was working well until one day when it was realised that any changes to account permissions in AD weren’t reflected in the personnel database.  To solve this, another trigger was created to send a Service Broker message back to the personnel database with details of the change.

This was where I came in, it was noticed that the system had started to run slower and slower, not only that but permissions seemed to be constantly changing for no obvious reason.  Were the machines finally waking up and taking over?

There’s a reasonable explanation here, for some definition of reasonable.

A Tool For Analyzing AG Replica Latency

Simon Su has an interesting tool available:

I wrote an article to discuss about data movement latency between AG groups:

Now I develop a tool to analyze AG log block movement latency between replicas and create report accordingly.

Click through for more info and check it out on Github.


September 2018
« Aug