Data Cleaning Tips

Michael Grogan has a few tips for data cleaning with R:

6. Delete observations using head and tail functions

The head and tail functions can be used if we wish to delete certain observations from a variable, e.g. Sales. The head function allows us to delete the first 30 rows, while the tail function allows us to delete the last 30 rows.

When it comes to using a variable edited in this way for calculation purposes, e.g. a regression, the as.matrix function is also used to convert the variable into matrix format:

Salesminus30days←head(Sales,-30)
X1=as.matrix(Salesminus30days)
X1

Salesplus30days<-tail(Sales,-30)
X2=as.matrix(Salesplus30days)
X2

Some of these tips are for people familiar with Excel but fairly new to R.  These also use the base library rather than the tidyverse packages (e.g., using merge instead of dplyr’s join or as.date instead of lubridate).  You may consider that a small negative, but if it is, it’s a very small one.

Related Posts

R Services Internals

Niels Berglund has an excellent series on R Services internals.  Here’s the latest post: This post is the ninth post about Microsoft SQL Server R Services, and the eight post that drills down into the internal of how it works. So far in this series we have been looking at what happens in SQL Server […]

Read More

Multiple Data Sets In External Scripts

Tomaz Kastrun shows a workaround to the “one data set” limit in sp_execute_external_script: Some of the  arguments of the procedure sp_execute_external_script are enumerated. This is valid for the inputting dataset and as the name of argument @input_data_1 suggests, one can easily (and this is valid doubt) think, there can also be @input_data_2 argument, and so on. Unfortunately, this is […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

July 2017
MTWTFSS
« Jun  
 12
3456789
10111213141516
17181920212223
24252627282930
31