Shredding Excel With R

Kevin Feasel

2017-01-18

ETL, R

John MacKintosh shows how to use R for wrangling + ETL:

I had over 140 files to process. That’s not usually a big deal – I normally use SQL Server Integration Services to loop through network folders, connect to hundreds of spreadsheets and extract the source data.

But this relies on the data being in a tabular format (like a dataframe or database table).

A quick glance at the first few sheets confirmed I could not use this approach – the data was not in tabular format. Instead it was laid out in a format suited to viewing the data on screen – with the required data scattered in different ranges throughout each sheet ( over 100 rows and many columns). It wasn’t going to be feasible to point SSIS at different locations within each sheet. (It can be done, but it’s pretty complex and I didn’t have time to experiment).

The other challenge was that over time, changes to design meant that data moved location e.g. dates that were originally in cell C2 moved to D7, then moved again as requirements evolved. There were 14 different templates in all, each with subtle changes. Each template was going to need a custom solution to extract the data.

This is a good look at how R can be about more than “just” statistical analysis.

Related Posts

Combining Plots In R With cowplot

Abdul Majed Raja shows how to use the cowplot library in R to merge together independent plots into a single image: The way it works in cowplot is that, we have assign our individual ggplot-plots as an R object (which is by default of type ggplot). These objects are finally used by cowplot to produce […]

Read More

AzureR Packages In Cran

David Smith points out that the Azure packages for R are now in CRAN: The suite of AzureR packages for interfacing with Azure services from R is now available on CRAN. If you missed the earlier announcements, this means you can now use the install.packages function in R to install these packages, rather than having to install from the […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031