Scraping Session Data

Amy Herold has scraped PASS Summit 2017 submissions using Powershell:

Never having done a web scrape before, this was the perfect subject for my first time – grabbing all the sessions submitted to PASS Summit 2017…and doing it with PowerShell! Here is the script I used for this. I have accounted for the following:

  • Apostrophes (aka single quote). They will break your insert unless you have two of them, and for some reason, people seem to use them all over the place.

  • Formatting the string data for insert. No, your data will not magically come out right in your insert with single quotes so you need to add them.

  • Additional ID and deleted fields.

  • Speaker URL and ID. Will be using this to scrape speaker details later.

  • Accurate lower and upper bounds. These were arrived at by trial and error (you’re welcome), as well as the clean up of the data I scraped. More on this later.

Powershell probably wouldn’t be my first language for web scrapes—that’d be Python—but Amy shows how to get a scrape going.

Related Posts

Powershell: Text Search In A Table Value

Shane O’Neill clarifies a misunderstanding in Powershell: and you are running the following PowerShell command to check if the results contain a value… 1 2 3 $String = "abc" $Array = @(Invoke-Sqlcmd -ServerInstance "SQLServer" -Database "Database" -Query "SELECT code FROM dbo.users") $Array.Contains($string) It will return FALSE. Now we know that the FALSE is false because we know that the string […]

Read More

Basic Powershell Regex

Adam Bertram shows how to use regular expressions for pattern matching in Powershell: However, regex has traditionally been a topic that most IT pros shy away from when they first see how it works. Admittedly, regex does take a bit of getting used to, and you’re still probably going to have to do some Googling […]

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930