Scraping Session Data

Amy Herold has scraped PASS Summit 2017 submissions using Powershell:

Never having done a web scrape before, this was the perfect subject for my first time – grabbing all the sessions submitted to PASS Summit 2017…and doing it with PowerShell! Here is the script I used for this. I have accounted for the following:

  • Apostrophes (aka single quote). They will break your insert unless you have two of them, and for some reason, people seem to use them all over the place.

  • Formatting the string data for insert. No, your data will not magically come out right in your insert with single quotes so you need to add them.

  • Additional ID and deleted fields.

  • Speaker URL and ID. Will be using this to scrape speaker details later.

  • Accurate lower and upper bounds. These were arrived at by trial and error (you’re welcome), as well as the clean up of the data I scraped. More on this later.

Powershell probably wouldn’t be my first language for web scrapes—that’d be Python—but Amy shows how to get a scrape going.

Related Posts

Making SpeedPASSes Better

Wayne Sheffield shares some useful tips for making the SpeedPASS experience better: As things were wrapping up for our event last year, there was an important change made at the SQLSaturday site for dealing with SpeedPASSes. Previously, the admission ticket was sized differently from everything else, which created hassles in cutting and in using perforated […]

Read More

dbatools 1.0 Forthcoming

Chrissy LeMaire announces that dbatools will be out on June 19th by my count: We’ve got about 30 issues left to resolve which you can see and follow on our GitHub Projects page. If you’ve ever been interested in helping, now is the perfect time as we only have 30 more days left to reach our […]

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930