Amy Herold has scraped PASS Summit 2017 submissions using Powershell:
Never having done a web scrape before, this was the perfect subject for my first time – grabbing all the sessions submitted to PASS Summit 2017…and doing it with PowerShell! Here is the script I used for this. I have accounted for the following:
-
Apostrophes (aka single quote). They will break your insert unless you have two of them, and for some reason, people seem to use them all over the place.
-
Formatting the string data for insert. No, your data will not magically come out right in your insert with single quotes so you need to add them.
-
Additional ID and deleted fields.
-
Speaker URL and ID. Will be using this to scrape speaker details later.
-
Accurate lower and upper bounds. These were arrived at by trial and error (you’re welcome), as well as the clean up of the data I scraped. More on this later.
Powershell probably wouldn’t be my first language for web scrapes—that’d be Python—but Amy shows how to get a scrape going.