Using stringr To Remove HTML

Kevin Feasel

2018-02-22

R

I have a quick post on removing HTML markup with stringr:

This is a quick post today on removing HTML tags using the stringr package in R.

My purpose here is in taking some raw data, which can include HTML markup, and preparing it for a vectorizer.  I don’t need the resulting output to look pretty; I just want to get rid of the HTML characters.

Click through for the script.  If you need to do something nice with the text afterward, my technique is probably too much sledgehammer for niceties, but it does the trick for pre-processing before vectorization.

Related Posts

WVPlots

Nina Zumel announces a new version of WVPlots on CRAN: WVPlots was originally a catch-all package of ggplot2 visualizations that we at Win-Vector tended to use repeatedly, and wanted to turn into “one-liners.” A consequence of this is that the older visualizations had our preferred color schemes hard-coded in. More recent additions to the package sometimes had palette […]

Read More

Icon Maps in R

Laura Ellis shows how you can build maps full of little icons: That was ok, but we should try to make the images more aesthetically pleasing using the magick package. We make each image transparent with the image_transparent() function. We can also make the resulting image a specific color with image_colorize(). I then saved the […]

Read More

Categories

February 2018
MTWTFSS
« Jan Mar »
 1234
567891011
12131415161718
19202122232425
262728