Not Catching Them All

Hanjo Odendaal explains clustering techniques using Pokemon:

To collect the data on all the first generation pokemon, I employ Hadley Wickam’s rvest package. I find it very intuitive and can handle all of my needs in collecting and extracting the data from a pokemon wiki. I will grab all the Pokemon up until to Gen II, which constitutes 251 individuals. I did find the website structure a bit of a pain as each pokemon had very different looking web pages. But, with some manual hacking, I eventually got the data in a nice format.

This probably means a lot more to you if you grew up in front of a Game Boy, but there’s some good technique in here regardless.

Related Posts

Probabilities And Poker

Steve Miller has a notebook on 5-card draw probabilities: The population of 5 card draw hands, consisting of 52 choose 5 or 2598960 elements, is pretty straightforward both mathematically and statistically. So of course ever the geek, I just had to attempt to show her how probability and statistics converge. In addition to explaining the […]

Read More

Visualization Over Kafka And KSQL

Shant Hovsepian shows off a data visualization tool which can read Kafka Streams data: KSQL is a game-changer not only for application developers but also for non-technical business users. How? The SQL interface opens up access to Kafka data to analytics platforms based on SQL. Business analysts who are accustomed to non-coding, drag-and-drop interfaces can […]

Read More

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031