Fraud Detection With Python

Kevin Jacobs has a walkthrough of how to use Pandas and scikit-learn to perform fraud detection against a sample set of credit card transactions:

Apparently, the data consists of 28 variables (V1, …, V28), an “Amount” field a “Class” field and the “Time” field. We do not know the exact meanings of the variables (due to privacy concerns). The Class field takes values 0 (when the transaction is not fraudulent) and value 1 (when a transaction is fraudulent). The data is unbalanced: the number of non-fraudulent transactions (where Class equals 0) is way more than the number of fraudulent transactions (where Class equals 1). Furthermore, there is a Time field. Further inspection shows that these are integers, starting from 0.

There is a small trick for getting more information than only the raw records. We can use the following code:


This code will give a statistically summary of all the columns. It shows for example that the Amount field ranges between 0.00 and 25691.16. Thus, there are no negative transactions in the data.

The Kaggle competition data set is available, so you can follow along.

Related Posts

Reviewing Word Associations With R

Julia Silge does some exploratory analysis on the Small World of Words project: The Small World of Words project focuses on word associations. You can try it out for yourself to see how it works, but the general idea is that the participant is presented with a word (from “telephone” to “journalist” to “yoga”) and is then […]

Read More

Executing Python Code In Power BI

Brad Llewellyn shows how to build a visual based on a Python script using Power BI: Now that we’ve seen our data, it’s a relatively simple task to convert the R script to a Python script. There are a few major differences. First, Python is a general purpose programming language, whereas R is a statistical […]

Read More


June 2017
« May Jul »