Data Cleaning Tips

Kevin Feasel

2017-07-12

R

Michael Grogan has a few tips for data cleaning with R:

6. Delete observations using head and tail functions

The head and tail functions can be used if we wish to delete certain observations from a variable, e.g. Sales. The head function allows us to delete the first 30 rows, while the tail function allows us to delete the last 30 rows.

When it comes to using a variable edited in this way for calculation purposes, e.g. a regression, the as.matrix function is also used to convert the variable into matrix format:

Salesminus30days←head(Sales,-30)
X1=as.matrix(Salesminus30days)
X1

Salesplus30days<-tail(Sales,-30)
X2=as.matrix(Salesplus30days)
X2

Some of these tips are for people familiar with Excel but fairly new to R.  These also use the base library rather than the tidyverse packages (e.g., using merge instead of dplyr’s join or as.date instead of lubridate).  You may consider that a small negative, but if it is, it’s a very small one.

Related Posts

Using Service Broker To Queue Up External Script Calls

Arvind Shyamsundar shows how to use Service Broker to run external R or Python scripts based on new data coming into a transactional system: Here, we will show you how you can use the asynchronous execution mechanism offered by SQL Server Service Broker to ‘queue’ up data inside SQL Server which can then be asynchronously passed to […]

Read More

Reasons For Using Docker With R

Jeroen Ooms gives us a few reasons why we might want to containerize our R-based products: The flagship of the OpenCPU system is the OpenCPU server: a mature and powerful Linux stack for embedding R in systems and applications. Because OpenCPU is completely open source we can build and ship on DockerHub. A ready-to-go linux server […]

Read More

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31