Using drop = FALSE On Data Frames

Kevin Feasel



John Mount explains why you might want to add drop = FALSE to your data.frame operations:

We were merely trying to re-order the rows and the result was converted to a vector. This happened because the rules for [ , ] change if there is only one result column. This happens even if the there had been only one input column. Another example is: d[,] is also vector in this case.

The issue is: if we are writing re-usable code we are often programming before we know complete contents of a variable or argument. For a data.frame named “g” supplied as an argument: g[vec, ] can be a data.frame or a vector (or even possibly a list). However we do know if g is a data.frame then g[vec, , drop = FALSE] is also a data.frame(assuming vec is a vector of valid row indices or a logical vector, note: NA induces some special cases).

We care as vectors and data.frames have different semantics, so are not fully substitutable in later code.

Definitely read the comments on this one as well, as John extends his explanation and others chime in with very useful notes.

Related Posts

Python versus R (Again)

Alex Woodie looks at whether Python is dominating R in the data science space: There is some evidence that Python’s popularity is hurting R usage. According to the TIOBE Index, Python is currently the third most popular language in the world, behind perennial heavyweights Java and C. From August 2018 to August 2019, Python usage surged […]

Read More

Local Randomness and R

Evgeni Chasnovski has a problem around generating random data: Let’s say we have a deterministic (non-random) problem for which one of the solutions involves randomness. One very common example of such problem is a function minimization on certain interval: it can be solved non-randomly (like in most methods of optim()), or randomly (the simplest approach being […]

Read More


March 2018
« Feb Apr »