Scraping SQL Saturday Statistics

Kevin Feasel

2017-11-14

Python, R

Tomaz Kastrun shows how to use rvest to read the SQL Saturday website and parse schedule details:

I wanted to check a simple query: How many times has a particular topic been presented and from how many different presenters.

Sounds interesting, tackling the problem should not be a problem, just that the end numbers may vary, since there will be some text analysis included.

Read on for the code and some analysis.

Related Posts

Web Analytics With R

Maelle Salmon performs some analysis on the Locke Data blog: Often, the URL of a blog post can be guessed based on its title, e.g. this one can be read here. But even if the transition from the Markdown file information to an URL is logical, it was best to get URLs from the in situ blog posts, […]

Read More

Using stringr To Remove HTML

I have a quick post on removing HTML markup with stringr: This is a quick post today on removing HTML tags using the stringr package in R. My purpose here is in taking some raw data, which can include HTML markup, and preparing it for a vectorizer.  I don’t need the resulting output to look […]

Read More

Categories

November 2017
MTWTFSS
« Oct Dec »
 12345
6789101112
13141516171819
20212223242526
27282930