Press "Enter" to skip to content

Category: R

The Effects of Undersampling and Oversampling on Predicted Probability

Bryan Shalloway has an interesting article for us:

In classification problems, under and over sampling techniques shift the distribution of predicted probabilities towards the minority class. If your problem requires accurate probabilities you will need to adjust your predictions in some way during post-processing (or at another step) to account for this.

Bryan has a clear example showing this problem in action.

Comments closed

Image Modification with R in Machine Learning Services

Rajendra Gupta messes with The Mouse:

There is a famous adage in English: “A picture is worth a thousand words”. You can represent your information using the image in various formats such as JPEG, PNG, GIF. Usually, we use various client tools such as MS Paint, Photo, photoshop or other client applications for working with the images. You can convert image format, modify the size, applying various effects, multiple animated images.

SQL Machine Learning language – R makes us capable of working with the images directly with the SQL Server. In this article, we will use SQL Machine Learning using R scripts for image processing.

Click through for examples.

Comments closed

AzureTableStor: Table Storage in R

Hong Ooi announces a new package on CRAN:

I’m pleased to announce that the AzureTableStor package, providing a simple yet powerful interface to the Azure table storage service, is now on CRAN. This is something that many people have requested since the initial release of the AzureR packages nearly two years ago.

Azure table storage is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schemaless design. Because table storage is schemaless, it’s easy to adapt your data as the needs of your application evolve. Access to table storage data is fast and cost-effective for many types of applications, and is typically lower in cost than traditional SQL for similar volumes of data.

If that sounds like a fit for you, check out the package.

Comments closed

Error Handling in R

Adi Sarid compares a few methods for error handling in R:

Error catching can be hard to catch at times (no pun intended). If you’re not used to error handling, this short post might help you do it elegantly.

There are many posts about error handling in R (and in fact the examples in the purrr package documentation are not bad either). In this sense, this post is not original.

However, I do demonstrate two approaches: both the base-R approach (tryCatch) and the purrr approach (safely and possibly). The post contains a concise summary of the two methods, with a very simple example.

Read the whole thing. H/T R-Bloggers

Comments closed

Understanding Skewness and Kurtosis

George Pipis explains two key concepts of a distribution:

Most commonly a distribution is described by its mean and variance which are the first and second moments respectively. Another less common measures are the skewness (third moment) and the kurtosis (fourth moment). Today, we will try to give a brief explanation of these measures and we will show how we can calculate them in R.

Click through for an explanation. H/T R-Bloggers

Comments closed

GIS Capabilities in R

Lionel Hertzog shows off spatial capabilities in R:

All of these operations follow the same logic, st_operation(A, B) checks for each combinations of the geometries in A and B whether A operation B is true or false. For instance st_within(A, B) checks whether the geometries in A are within B, this is similar to st_contains(B, A), the difference between the two being the shape of the returned object. If A has n geometries and B has m, st_contains(B, A) returns a list of length m where each elements contains the row IDs (numbers between 1 and n) of the geometries in A satisfying the operation. By using sparse=FALSE the functions returns matrices, like st_within(A, B, sparse=FALSE) returns a n x m matrix, st_within(B, A, sparse=FALSE) returns a m x n matrix. Note that running st_operation(A, A) checks the operation between all geometries of the object, so returning a n x n matrix.

Click through for part 1 of the series.

Comments closed

Understanding How GPS Works

Holger von Jouanne-Diedrich walks us through the basics of global positioning:

Last week, I showed you a method of how to find the fastest path from A to B: Finding the Shortest Path with Dijkstra’s Algorithm. To make use of that, we need a method to determine our position at any point in time.

For that matter, many devices use the so-called Global Positioning System (GPS). If you want to understand how it works and do some simple calculations in R, read on!

Do read the whole thing; the explanation is laid out really well.

Comments closed

Full Moon Finder in R

Tomaz Kastrun has a not-so-useless function:

The full moon function, or should we call it fool moon – due to it’s simplistic and approximate nature, calculates the the difference between the date (only date, no time, no long/lat coordinates) and Julian constant. Should you be using a different calendar, don’t run the function, just look out the window.

The function is written based on generalized equation for julian day numbers and months. Another one could be to calculate RMSE of the predicted values and realization of lunar behavior (lunatic start time). In this case – reversed engineering – you would use the the approximate date/time for the first new moon after that date if the synod period was constant. This number than obtained is only empirically proven by recursively solving for the new “possible date/time” of lunar behavior and calculate the prediction error. In order to minimize the RMSE value of the difference between the full moon dates/times predicted formula and the dates/times for the full moon over the next 10 years you get something like this.

Click through for the function as well as sound advice if it’s not a full moon.

Comments closed