Imbalanced Data In R

Rathnadevi Manivannan explains how to deal with imbalanced data using R:

Imbalanced data refers to classification problems where one class outnumbers other class by a substantial proportion. Imbalanced classification occurs more frequently in binary classification than in multi-level classification. For example, extreme imbalanced data can be seen in banking or financial data where majority credit card uses are acceptable and very few credit card uses are fraudulent.

With an imbalanced dataset, the information required to make an accurate prediction about the minority class cannot be obtained using an algorithm. So, it is recommended to use balanced classification dataset.

Rathnadevi uses fraudulent transactions for his sample, but medical diagnoses is also a good example:  suppose 1 person in 10,000 has a particular disease.  You’re 99.99% right if you just say nobody has the disease, but that’s a rather unhelpful model.

Related Posts

Connecting To Elasticsearch With R

Jerod Johnson has a sample of connecting to Elasticsearch with R: You will need the following information to connect to Elasticsearch as a JDBC data source: Driver Class: Set this to cdata.jdbc.elasticsearch.ElasticsearchDriver. Classpath: Set this to the location of the driver JAR. By default, this is the lib subfolder of the installation folder. The DBI functions, […]

Read More

Voice Control For Shiny Apps

Over at Jumping Rivers, an example of using a Javascript library to control a page using voice commands: I have found that performance across all devices and browsers is definitely not equal. By far the best browser I have found for viewing the apps is Google Chrome. I have also tended to find that my […]

Read More


October 2017
« Sep Nov »