Tibbles In R

Tristan Mahr explains what tibbles and tribbles are and how they compare to built-in data frames:

The name “tribble” is short for “transposed tibble” (the transposed part referring to change from column-wise creation in tibble() to row-wise creation in tribble()).

I like to use light-weight tribbles for two particular tasks:

  • Recoding: Create a tribble of, say, labels for a plot and join it onto a dataset.

  • Exclusion: Identify observations to exclude, and remove them with an anti-join.

I’ve been more used to data frames than tibbles, but this post shows some interesting things you can do with tibbles a lot more easily than with data frames.  It’s enough to make me want to use tibbles more frequently.  H/T R-bloggers

Measuring Model Accuracy

Fabio Veronesi shows several methods of testing model accuracy:

Mean Squared Deviation or Mean Squared Error

This is simply the numerator of the previous equation, but it is not used often. The issue with both the RMSE and the MSE is that since they square the residuals they tend to be more affected by large residuals. This means that even if our model explains the large majority of the variation in the data very well, with few exceptions; these exceptions will inflate the value of RMSE.

Click through for several calculations.  H/T R-bloggers

Watch Those Indexes

Kennie Nybo Pontoppidan explains why that one university registration system was always throwing errors:

Remember that myITU was implemented on an Oracle database. Unlike on SQL Server, all tables in an Oracle database are physically represented as heaps, unless explicitly specified otherwise. That means no indexes. And I didn’t know anything about database performance back then, so I didn’t add any. Effectively any query against the course table would give a full table scans in the EnumerationType and EnumerationValue tables as a side effect. Fast forward to course enrolments…

At ITU, we had implemented a selection algorithm, which considered both the study program you were enrolled into as well as how early you registered for a course. Early birds got the course, so students would be ready when course enrolment was opened at 12pm. And at 12.05 myITU would start failing with 500 Internal Server Errors.

This is a pretty common occurrence, followed up by the “let’s add all the indexes” phase.

Partitioned Views With Memory-Optimized Tables

Ned Otter ran into an issue building partitioned views from a combination of disk-based and memory-optimized tables:

Let’s assume that we have two tables that belong to a partitioned view. Both tables can be memory-optimized, or one table can be memory-optimized, and the other on-disk.

Success condition

an UPDATE occurs to a row in a table, and the UPDATE does not change where the row would reside, i.e. does not cause it to “move” to another table, based on defined CONSTRAINTS

Failure conditions:

   a. UPDATE occurs to a row in the memory-optimized table that causes it to move to either another memory-optimized table, or a on-disk table

   b. UPDATE occurs to a row in the on-disk table that causes it to move to the memory-optimized table

Read the whole thing.  The good news is that if you’re splitting in a way that doesn’t require updating from one type of table to the other, partitioned views work fine.

Bashing Windows

Steve Jones shows how to install Bash on Windows 10:

With SQL Server coming on Linux, some people will want to learn a bit of Linux. Or perhaps they need to get re-acquainted with the OS, which is my situation. I grew up on DOS, but moved to Unix in university. I’ve dabbled in Linux over the years, but with no real use for it over Windows, I abandoned it a decade ago.

Now, I’m trying to re-learn some things as I play with SQL Server on Linux.

Recently I saw a quick video from Scott Hanselman on the Bash subsystem in Windows. I actually first saw this live at the Build 2016 announcement, but when it was added in Beta to Windows 10, I didn’t add it. I’ve been meaning to, but hadn’t.

Until today.

Read on to see how to set this up on your Windows 10 machine.

Life Lessons Learned

Andy Levy shares a few lessons learned from painful experience:

A couple of times, I’ve had under-performing VMs running SQL Server. As I hadn’t been involved in the configuration, I thought everything had been provisioned properly. Turns out…not so much. Memory reservations, storage configuration, power profiles, all set up for suboptimal performance.
Lesson learned: Ask your VMWare admin if they’ve perused the best practices guide and review things yourself before going down the rabbit hole of SQL Server configuration & query tuning. If the underlying systems aren’t configured well, you’ll spin your wheels for a long time.

Read on for more lessons.

Bad Ideas Compendium

Denis Gobo explains some of the learning experiences he has had over the years:

Use SQL to bring down other things in the company
The request was that support wanted to know immediately when an order was placed. No problem, we can add a trigger that populates a table and then we have a job that runs once a minute to email the support group. Sounds reasonable right? Someone decided to do a little testing…. this person created 100,000 entries…. I guess you can imagine what happens next. Yep, the email server was overloaded. We had to quickly delete these rows and truncate the table that was used to send the emails.

Good ol’ eDOS (e-mail Denial of Service) attacks…

Pointers Can Be Sharp

Rob Farley describes a bad day he had:

There was a guy who needed to get his timesheets in. It wasn’t me – I just thought I could help …by making a copy of his timesheets in a separate table, so that he could prepare them there instead of having to use the clunky Access form. I’d gone into the shared Access file that people were using, made a copy of it, and then proceeded to clear out all the data that wasn’t about him, so that he could get his data ready. I figured once he was done, I’d just drop his data in amongst everyone else’s – and that would be okay.

Except that right after I’d cleared out everyone else’s data, everyone else started to complain that their data wasn’t there.

Heart-rate increased. I checked that I was using the copy, not the original… I closed it, opened the original, and saw that sure enough, only his data was there. Everyone else’s (including my own) data was gone.

As far as “oops” moments go, yeah, this is definitely on the list.  Read on for Rob’s explanation of what happened.

Don’t Shrink Databases

Arun Sirpal lets out a dirty secret:

We are here to talk about mistakes we used to make. There is one mistake that I am going to discuss and is something that I used to do 10 years ago, obviously I do not do this anymore.


There, I said it! Why did I do this? Well I never knew any better. Why shouldn’t you do this?

There are situations where a database shrink operation makes sense.  Those cases are quite rare.


Duncan Greaves explains Privacy Impact Assessments as part of the GDPR:

The processes and practices implemented by organisations should reflect the methodology of using a Privacy by Design approach to business systems. Undertaking a PIA/DPIA is not a mandatory part of the GDPR, but in doing so, organisations can show that they are compliant with the Act.

Conducting a PIA is designed to accomplish three main goals:

  • Ensuring conformance with applicable legal, regulatory, and policy requirements for privacy.

  • Determining the risks and effects.

  • Evaluating protections and alternative processes to mitigate potential privacy risks.

Worth reading if you’re based in Europe or do business with European customers.


July 2017
« Jun