Press "Enter" to skip to content

Category: R

Understanding Decision Trees

Ram Tavva walks us through the algorithm to design decision trees:

A decision tree is made up of several nodes:

1.Root Node: A Root Node represents the entire data and the starting point of the tree. From the above example the
First Node where we are checking the first condition, whether the movie belongs to Hollywood or not that is the
Rood node from which the entire tree grows
2.Leaf Node: A Leaf Node is the end node of the tree, which can’t split into further nodes.
From the above example ‘watch movie’ and ‘Don’t watch ‘are leaf nodes.
3.Parent/Child Nodes: A Node that splits into a further node will be the parent node for the successor nodes. The
nodes which are obtained from the previous node will be child nodes for the above node.

Read on for an example of implementation in R.

Comments closed

The Effects of Undersampling and Oversampling on Predicted Probability

Bryan Shalloway has an interesting article for us:

In classification problems, under and over sampling techniques shift the distribution of predicted probabilities towards the minority class. If your problem requires accurate probabilities you will need to adjust your predictions in some way during post-processing (or at another step) to account for this.

Bryan has a clear example showing this problem in action.

Comments closed

Image Modification with R in Machine Learning Services

Rajendra Gupta messes with The Mouse:

There is a famous adage in English: “A picture is worth a thousand words”. You can represent your information using the image in various formats such as JPEG, PNG, GIF. Usually, we use various client tools such as MS Paint, Photo, photoshop or other client applications for working with the images. You can convert image format, modify the size, applying various effects, multiple animated images.

SQL Machine Learning language – R makes us capable of working with the images directly with the SQL Server. In this article, we will use SQL Machine Learning using R scripts for image processing.

Click through for examples.

Comments closed

AzureTableStor: Table Storage in R

Hong Ooi announces a new package on CRAN:

I’m pleased to announce that the AzureTableStor package, providing a simple yet powerful interface to the Azure table storage service, is now on CRAN. This is something that many people have requested since the initial release of the AzureR packages nearly two years ago.

Azure table storage is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schemaless design. Because table storage is schemaless, it’s easy to adapt your data as the needs of your application evolve. Access to table storage data is fast and cost-effective for many types of applications, and is typically lower in cost than traditional SQL for similar volumes of data.

If that sounds like a fit for you, check out the package.

Comments closed

Finding Function Names with Overlap in R

Tomaz Kastrun has a script for us:

So the following function does just that; if goes through the list of functions for all the installed packages in particular environment and finds duplicates. Giving the R programmer the insights where to be cautious and refer to scope and namespace.

This is a cautionary tale to use full object naming, like coin::show() rather than just show().

Comments closed

Error Handling in R

Adi Sarid compares a few methods for error handling in R:

Error catching can be hard to catch at times (no pun intended). If you’re not used to error handling, this short post might help you do it elegantly.

There are many posts about error handling in R (and in fact the examples in the purrr package documentation are not bad either). In this sense, this post is not original.

However, I do demonstrate two approaches: both the base-R approach (tryCatch) and the purrr approach (safely and possibly). The post contains a concise summary of the two methods, with a very simple example.

Read the whole thing. H/T R-Bloggers

Comments closed

Understanding Skewness and Kurtosis

George Pipis explains two key concepts of a distribution:

Most commonly a distribution is described by its mean and variance which are the first and second moments respectively. Another less common measures are the skewness (third moment) and the kurtosis (fourth moment). Today, we will try to give a brief explanation of these measures and we will show how we can calculate them in R.

Click through for an explanation. H/T R-Bloggers

Comments closed

GIS Capabilities in R

Lionel Hertzog shows off spatial capabilities in R:

All of these operations follow the same logic, st_operation(A, B) checks for each combinations of the geometries in A and B whether A operation B is true or false. For instance st_within(A, B) checks whether the geometries in A are within B, this is similar to st_contains(B, A), the difference between the two being the shape of the returned object. If A has n geometries and B has m, st_contains(B, A) returns a list of length m where each elements contains the row IDs (numbers between 1 and n) of the geometries in A satisfying the operation. By using sparse=FALSE the functions returns matrices, like st_within(A, B, sparse=FALSE) returns a n x m matrix, st_within(B, A, sparse=FALSE) returns a m x n matrix. Note that running st_operation(A, A) checks the operation between all geometries of the object, so returning a n x n matrix.

Click through for part 1 of the series.

Comments closed

Understanding How GPS Works

Holger von Jouanne-Diedrich walks us through the basics of global positioning:

Last week, I showed you a method of how to find the fastest path from A to B: Finding the Shortest Path with Dijkstra’s Algorithm. To make use of that, we need a method to determine our position at any point in time.

For that matter, many devices use the so-called Global Positioning System (GPS). If you want to understand how it works and do some simple calculations in R, read on!

Do read the whole thing; the explanation is laid out really well.

Comments closed