Working With Data Frames In R

Kevin Feasel



Dave Mason has a couple of blog posts on data frames.  First, the basics:

Conceptually, a dataset is a grid or table of data elements. It consists of rows, which we specifically call “observations”, and of columns , which are called “variables”. (Observations may also be referred to as “instances”. Variables may also be referred to as “properties”.) The data frame in R is designed for data sets. As the R documentation tells us, data frames are “used as the fundamental data structure by most of R’s modeling software”.

The function we’ll be working with primarily in this post is the data.frame() function. I have read that in R programming, creating data frames with this function is rather uncommon. Most of the time, data frames are created by invoking other functions that read data from an external data source (like a file or a database table) with a data frame as the return type. But for simplicity, data.frame() will serve our purposes.

Then, subsetting data frames:

Adding columns to a data frame is easy–easy compared to adding rows. We’ll get to that. To add a column, first create a vector. The class doesn’t matter. But the number of elements does–it has to match the number of observations in the data frame. Now that we have our vector, here are some options to add it as a new column to a data frame: use the $ shortcut, use double brackets with the new column name, bind the vector to the dataframe with cbind().

The data frame (or tibble, if using the tidyverse version) is probably the single most important data type in R for getting work done.

Related Posts

Using cdata To Created Faceted Plots

Nina Zumel shows how to use the cdata package to create faceted ggplot2 plots: First, load the packages and data: library("ggplot2") library("cdata") iris <- data.frame(iris) Now define the data-shaping transform, or control table. The control table is basically a picture that sketches out the final data shape that I want. I want to specify the x and y columns of the plot […]

Read More

Using wrapr For A Consistent Pipe With ggplot2

John Mount shows how you can use the wrapr pipe to perform data processing and building a ggplot2 visual: Now we can run a single pipeline that combines data processing steps and ggplot plot construction. data.frame(x = 1:20) %.>% mutate(., y = cos(3*x)) %.>% ggplot(., aes(x = x, y = y)) %.>% geom_point() %.>% geom_line() %.>% ggtitle("piped ggplot2") Check […]

Read More


September 2018
« Aug Oct »