Press "Enter" to skip to content

Category: Python

Word Clouds In Python

Allison Tharp shows how to generate a word cloud using Python:

Every week, someone on Reddit posts a “word cloud” on all of the NFL team’s subreddits.  These word clouds show the most used words on that subreddit for the week (the larger the word, the more it was used).  These word plots are always really fascinating to me, so I wanted to try to make some for myself.  In this tutorial, we’ll be making the following word cloud from my board game stats twitter feed, @BGGStats

Looks like the implementation is fairly straightforward, so check it out.

Comments closed

More NFL Alerts

Allison Tharp has an update to her NFL Alerts Python script:

Next, I wanted to make the alerts be a little more meaningful.  The alert for a scoring play was already pretty good – it sends something like: BUF – Q4 – TD – J.Boykin 4 yd. pass from C.Jones (pass failed) Drive: 8 plays, 83 yards in 1:08 IND (19) at BUF (18).  This is good, and in fact it is what I want the rest of the alerts to look like.  However, I’d like the subject of the email to have the name of the team that scored (before it was just ‘Scoring Play’).

To do that, I needed to find out how to get the name of the scoring team.  This was a little tricky because the documentation for the nflgame library, though pretty good, doesn’t give a good indication on how to find this.

Read on for more details, including specifics on turnovers and penalties.

Comments closed

Azure ML To Python

Koos van Strien “graduates” from Azure ML into Python:

Python is often used in conjunction with the scikit-learn collection of libraries. The most important libraries used for ML in Python are grouped inside a distribution called Anaconda. This is the distribution that’s also used inside Azure ML1. Besides Python and scikit-learn, Anaconda contains all kinds of Data Science-oriented packages. It’s a good idea to install Anaconda as a distribution and use Jupyter (formerly IPython) as development environment: Anaconda gives you almost the same environment on your local machine as your code will run in once in Azure ML. Jupyter gives you a nice way to keep code (in Python) and write / document (in Markdown) together.

Anaconda can be downloaded from https://www.continuum.io/downloads.

If you’re going down this path, Anaconda is absolutely a great choice.

Comments closed

Big Play Alerts

Allison Tharp has a Python script to track extremely important events:

First, we get the game data for the game we want.  In this instance, I am getting game data for the Indianapolis vs Cincinnati game in the 4th week of the 2016 preseason and setting it to the variable g.  Next, we will get the current number of scoring plays (scores0), number of home/away team turnovers (home/awayto0), number of home/away penalties (home/awaypenalty0), and finally, the number of yards that resulted from home/away penalties (home/awaypenyds0).

The rest of the script runs while the game is still in progress.  To check if the game is in progress, we use g.game_over().  If this object is False, the game is ongoing:

I did not know about the nflgame module and I think my life has just become better as a result of learning about this.

Comments closed

PySpark With MapR

Justin Brandenburg has a tutorial on combining Python and Spark on the MapR platform:

Looking at the first 5 records of the RDD

kddcup_data.take(5)
This output is difficult to read. This is because we are asking PySpark to show us data that is in the RDD format. PySpark has a DataFrame functionality. If the Python version is 2.7 or higher, you can utilize the pandas package. However, pandas doesn’t work on Python versions 2.6, so we use the Spark SQL functionality to create DataFrames for exploration.

The full example is a fairly simple k-means clustering process, which is a great introduction to PySpark.

Comments closed

Analytic Tool Usage

Alex Woodie notes the increased popularity of Python for data analysis:

According to the results of the 2016 survey, R is the preferred tool for 42% of analytics professionals, followed by SAS at 39% and Python at 20%. While Python’s placing may at first appear to relegate the language to Bronze Medal status, it’s the delta here that really matters.

It’s interesting to see the breakdowns of who uses which language, comparing across industry, education, work experience, and geographic lines.

Comments closed

Azure ML Updates

David Smith walks us through new language engines supported in Azure ML:

ML studio now gives you even more flexibility, with new language engines supported in the language modules. Within the Execute Python Script module, you can now choose to use Python 2.7.11 or Python 3.5, both of which run within the Acaconda 4.0 distribution. And within the Execute R Script module, you can now choose Microsoft R Open 3.2.2 as your R engine, in addition to the existing CRAN R 3.1.0 engine. Microsoft R Open 3.2.2 not only gives you a newer R language engine, it also gives you access to a wealth of new R packages for use within ML Studio. Over 400 packages are pre-installed for use with the R Script module, and you can install and use any other R package (including CRAN packages and your own R packages) via the Script Bundle input port.

I’m interested in the Microsoft R Open language support, as Azure ML’s still using a relatively older version of R (3.1.0).

Comments closed

Running Compiled Code In Azure ML

Max Kaznady shows how to use R or Python scripts to call compiled code within Azure ML:

In this post, we focus on sourcing R and Python’s external dependencies, such as R libraries and Python modules, which are not already installed on Azure ML and require code compilation. Commonly the compiled code comes from a variety of other languages such as C, C++ and Fortran. One could also use this approach to wrap their compiled code with R or Python wrappers and run it on Azure ML.

To illustrate the process, we will build two MurmurHash modules from C++ for R and Python using the following two implementations on GitHub, and link them to Azure ML from a zipped folder

Link via David Smith.  I knew it was possible to call compiled C code from Python and R, but didn’t expect to be able to do it within Azure ML, so that’s good to know.

Comments closed

K-Means Clustering With Python

David Crook discusses k-means clustering and how to implement it using Python:

K-Means takes in an unlabeled data set and a whole real number, k.  K is the number of centroids, or clusters you wish to find.  If you do not know how many clusters there should be, it is possible to do some pre-processing to find that more automatically, however that is out of the scope of this article.  Once you have a data set and defined the size of k, K-Means begins its iterative process.  It starts by selecting centroids by moving them to the average of the data associated with them.  It then reshuffles all of the data into new groups based on the proximity to each centroid.

This is a big and detailed post, and worth reading in its totality.

Comments closed