Data Cleaning Tips

Kevin Feasel

2017-07-12

R

Michael Grogan has a few tips for data cleaning with R:

6. Delete observations using head and tail functions

The head and tail functions can be used if we wish to delete certain observations from a variable, e.g. Sales. The head function allows us to delete the first 30 rows, while the tail function allows us to delete the last 30 rows.

When it comes to using a variable edited in this way for calculation purposes, e.g. a regression, the as.matrix function is also used to convert the variable into matrix format:

Salesminus30days←head(Sales,-30)
X1=as.matrix(Salesminus30days)
X1

Salesplus30days<-tail(Sales,-30)
X2=as.matrix(Salesplus30days)
X2

Some of these tips are for people familiar with Excel but fairly new to R.  These also use the base library rather than the tidyverse packages (e.g., using merge instead of dplyr’s join or as.date instead of lubridate).  You may consider that a small negative, but if it is, it’s a very small one.

Related Posts

Microsoft R Open 3.5.1

David Smith announces Microsoft R Open 3.5.1: Microsoft R Open 3.5.1 has been released, combining the latest R language engine with multi-processor performance and tools for managing R packages reproducibly. You can download Microsoft R Open 3.5.1 for Windows, Mac and Linux from MRAN now. Microsoft R Open is 100% compatible with all R scripts and packages, and works with […]

Read More

Performing Linear Regression With Power BI

Jason Cantrell shows how to create a simple linear regression in Power BI: Linear Regression is a very useful statistical tool that helps us understand the relationship between variables and the effects they have on each other. It can be used across many industries in a variety of ways – from spurring value to gaining […]

Read More

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31