Dave Mason continues scraping web pages for fun and profit:
In the last post, we looked at a way to scrape HTML table data from web pages, and save the data to a table in SQL Server. One of the drawbacks is the need to know the schema of the data that gets scraped–you need a SQL Server table to store the data, after all. Another shortcoming is if there are multiple HTML tables, you need to identify which one(s) you want to save.
For this post, we’ll revisit web scraping with Machine Learning Services and R. This time, we’ll take a schema-less approach that returns JSON data. As before, this web page will be scraped: Boston Celtics 2016-2017. It shows two HTML tables (grids) of data for the Boston Celtics, a professional basketball team. The first grid lists the roster of players, the second is a listing of games played during the regular season.
Click through to see how Dave manages this feat.