Shredding Excel With R

Kevin Feasel

2017-01-18

ETL, R

John MacKintosh shows how to use R for wrangling + ETL:

I had over 140 files to process. That’s not usually a big deal – I normally use SQL Server Integration Services to loop through network folders, connect to hundreds of spreadsheets and extract the source data.

But this relies on the data being in a tabular format (like a dataframe or database table).

A quick glance at the first few sheets confirmed I could not use this approach – the data was not in tabular format. Instead it was laid out in a format suited to viewing the data on screen – with the required data scattered in different ranges throughout each sheet ( over 100 rows and many columns). It wasn’t going to be feasible to point SSIS at different locations within each sheet. (It can be done, but it’s pretty complex and I didn’t have time to experiment).

The other challenge was that over time, changes to design meant that data moved location e.g. dates that were originally in cell C2 moved to D7, then moved again as requirements evolved. There were 14 different templates in all, each with subtle changes. Each template was going to need a custom solution to extract the data.

This is a good look at how R can be about more than “just” statistical analysis.

Related Posts

Voice Control For Shiny Apps

Over at Jumping Rivers, an example of using a Javascript library to control a page using voice commands: I have found that performance across all devices and browsers is definitely not equal. By far the best browser I have found for viewing the apps is Google Chrome. I have also tended to find that my […]

Read More

Visualizing In R: 3 Packages

Kristian Larsen has a quick demo of three R visualization packages, ggplot2, dygraphs, and plotly: Another value generating visualisation package in R is dygraphs. This package focuses on creating interactive visualisations with elegant interactive coding modules. Furthermore, the package specialises in creating visualisations for machine learning methods. The below coding generates different visualisation graphs with dygraphs: Three […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031