Press "Enter" to skip to content

Category: R

Time Series Forecasting in R

Selcuk Disci contrasts a couple of methods for time series forecasting:

It is always hard to find a proper model to forecast time series data. One of the reasons is that models that use time-series data often expose to serial correlation. In this article, we will compare k nearest neighbor (KNN) regression which is a supervised machine learning method, with a more classical and stochastic process, autoregressive integrated moving average (ARIMA).

We will use the monthly prices of refined gold futures(XAUTRY) for one gram in Turkish Lira traded on BIST(Istanbul Stock Exchange) for forecasting. We created the data frame starting from 2013. You can download the relevant excel file from here.

Click through for the demonstration. H/T R-Bloggers.

Comments closed

SQL Server R and Python Language Extensions Now Open Source

The SQL Server team has an announcement:

Previously, we announced a Java extensionToday, we are sharing that we are open sourcing the R and Python language extensions for SQL Server for both Windows and Linux on GitHub.

These extensions are the latest examples using an evolved programming language extensibility architecture which allows integration with a new type of language extension. This new architecture gives customers the freedom to bring their own runtime and execute programs using that runtime in SQL Server, while leveraging the existing security and governance that the SQL Server programming language extensibility architecture provides.

Very interesting.

Comments closed

Random Numbers in R: Parallel Processing Edition

Henrik Bengtsson takes us through an interesting issue:

R does a superb job of taking care of us when it comes to random number generation – as long as we run our analysis sequentially in a single R process. Formally R uses the Mersenne Twister RNG algorithm [1] by default, which can we can set explicitly using RNGkind("Mersenne-Twister"). However, like many other RNG algorithms, the authors designed this one for generating random number sequentially but not in parallel. If we use it in parallel code, there is a risk that there will a correlation between the random numbers generated in parallel, and, when taken together, they may no longer be “random enough” for our needs.

The post covers how the future package has your back when it comes to random numbers. H/T R-Bloggers.

Comments closed

ggwebthemes as a Package

Petr Baranovskiy has released an R package:

I was happy to see that some people liked those and suggested that I should make a package. I tended to view packages as large collections of code and functions, but as Sébastien Rochette wisely put it, “If you have one function, create a package! If this simplifies your life, why not?” And since I will be frequently using these themes in subsequent posts, I’d like to make it as convenient as possible for the reader to install and use them.

So here is the ggwebthemes package! It has the same three themes, which I have tweaked and improved some more.

Click through for instructions on how to install ggwebthemes via GitLab.

Comments closed

Image Classification with Keras and TensorFlow 2 in R

Shirin Glander takes us through the task of image classification using TensorFlow version 2.2.0:

Recently, I have been getting a few comments on my old article on image classification with Keras, saying that they are getting errors with the code. And I have also gotten a few questions about how to use a Keras model to predict on new images (of different size). Instead of replying to them all individually, I decided to write this updated version using recent Keras and TensorFlow versions (all package versions and system information can be found at the bottom of this article, as usual).

Click through for the R code.

Comments closed

R: Avoid apply() with Large Datasets

George Pipis walks us through the performance cost of using the apply() function against a large dataset in R:

When we are dealing with large datasets and there is a need to calculate some values like the row/column min/max/rank/mean etc we should avoid the apply function because it takes a lot of time. Instead, we can use the matrixStats package and its corresponding functions. Let’s provide some comparisons.

Click through for a demonstration of how much slower it can be in certain cases. H/T R-Bloggers.

Comments closed

Filtering with dplyr Using Strings as Expressions

Kenneth Tay shows how to build arbitrary expressions to execute in dplyr’s filter function:

This took me a while to figure out and so I thought I would post this as future reference. Let’s say I have the mtcars data and I want to filter for just the rows with cyl == 6. I would do something like this:

library(tidyverse)
data(mtcars)
mtcars %>% filter(cyl == 6)

What if I had the filter condition as a string instead?

Read on to see how you can do this. Given the answer, I wonder if there’s any chance that could be turned into an injection vulnerability. H/T R-Bloggers

Comments closed

The State of R Packages Locally

Maelle Salmon and Gabor Csardi walk us through some details about installed packages on a system:

Now how do you know where any of your installed packages was installed? You can use find.package() and path.package()!

To check whether a package is installed, it is better to use find.package() than installed.packages() because the latter, as its docs state, can be slow on some systems. In both cases, it does not mean the package is usable, for that you’d need to use library() or require().

Read on for several tips around where packages are located, what their contents look like, and learning a bit more about the actual code in packages.

Comments closed