Tabulizer

Kevin Feasel

2016-12-02

R

Troy Walters uses the Tabulizer package to extract tables from a PDF and turn them into an R matrices or data frames:

Next we will use the extract_tables() function from tabulizer. First, I specify the url of the pdf file from which I want to extract a table. This pdf link includes the most recent data, covering the period from July 1, 2016 to November 25, 2016. I am using the default parameters for extract_tables. These are guess and method. I’ll leave guess set to TRUE, which tells tabulizer that we want it to figure out the locations of the tables on its own. We could set this to FALSE if we want to have more granular control, but for this application we don’t need to. We leave the method argument set to “matrix”, which will return a list of matrices (one for each pdf page). This could also be set to return data frames instead.

This is nice.  I have to imagine it only works for text-based PDFs and not ones which are generated from a series of images.

Related Posts

Combining Plots In R With cowplot

Abdul Majed Raja shows how to use the cowplot library in R to merge together independent plots into a single image: The way it works in cowplot is that, we have assign our individual ggplot-plots as an R object (which is by default of type ggplot). These objects are finally used by cowplot to produce […]

Read More

AzureR Packages In Cran

David Smith points out that the Azure packages for R are now in CRAN: The suite of AzureR packages for interfacing with Azure services from R is now available on CRAN. If you missed the earlier announcements, this means you can now use the install.packages function in R to install these packages, rather than having to install from the […]

Read More

Categories

December 2016
MTWTFSS
« Nov Jan »
 1234
567891011
12131415161718
19202122232425
262728293031