Scraping Session Data

Amy Herold has scraped PASS Summit 2017 submissions using Powershell:

Never having done a web scrape before, this was the perfect subject for my first time – grabbing all the sessions submitted to PASS Summit 2017…and doing it with PowerShell! Here is the script I used for this. I have accounted for the following:

  • Apostrophes (aka single quote). They will break your insert unless you have two of them, and for some reason, people seem to use them all over the place.

  • Formatting the string data for insert. No, your data will not magically come out right in your insert with single quotes so you need to add them.

  • Additional ID and deleted fields.

  • Speaker URL and ID. Will be using this to scrape speaker details later.

  • Accurate lower and upper bounds. These were arrived at by trial and error (you’re welcome), as well as the clean up of the data I scraped. More on this later.

Powershell probably wouldn’t be my first language for web scrapes—that’d be Python—but Amy shows how to get a scrape going.

Related Posts

Collecting PRINT Outputs From Powershell

Jana Sattainathan shows how to query a number of SQL Server instances in parallel using Powershell and collecting the PRINT outputs from each: As an example, you may have a block of SQL that PRINTs out the current privileges in the databasethat can then be saved off and used as an independent script. In my case […]

Read More

Quantifier {x,y} Following Nothing

Shane O’Neill reminds us that reading is fundamental: Glancing at the error message, the first things that stick out are the bits “{x,y}” so I change my regex to be anywhere from 1 to  digits "*\d{1,6}$" Why are you glancing, read the error message! That doesn’t work, so I again quickly scan the error message and see the bit […]

Read More


June 2017
« May Jul »