Category: R

Modifying A ggplot2 Theme

Published 2018-10-11 by Kevin Feasel

Sebastian Sauer gives us an example of modifying a standard ggplot2 theme:

ggplot2 is customizeable. Frankly, one can change a heap of details – not everything probably, but a lot. Of course, one can add a theme to the ggplot call, in order to change the theme. However, a more catch-it-all approach would be to change the standard theme of ggplot itself. In this post, we’ll investigate this option.

To date, I’ve only used themes others have created, but if you need to customize a theme, there’s a lot you can do here.

Comments closed

Multi-Threaded R With Microsoft R Client

Published 2018-10-10 by Kevin Feasel

David Parr shows us how to get started with Microsoft R Client and performs some quick benchmarking:

This message will pop up, and it’s worth noting as it’s got some information in it that you might need to think about:

It’s worth noting that right now Microsoft r Client is lagging behind the current R version, and is based on version 3.4 of R, not 3.5. This will mean your default package libraries will not be shared between the installations if you are running R 3.5.
It’s using a snapshot of CRAN called MRAN to source packages by default. 90% of the time it will operate just as you expect, but because it takes a ‘snapshot’ of packages, newer features and changes that have hit CRAN may not be in the version of the package you are grabbing.
- RevoScaleR and probably the ggplot2 and dplyr packages will likely be installed for you already as default in Microsoft R Client. The other two you will probably have to install yourself.
Intel MKL will have scanned your system on install and attempted to work out how many cores your processor has. Here it’s identified 2 on my old Lenovo Yoga. This is where the speed boost will come from.

I had an old two-core Lenovo Yoga too, so this article really spoke to me.

Comments closed

Image Clustering With Keras And R

Published 2018-10-09 by Kevin Feasel

Shirin Glander shows us how to use R to extract learned features from Keras and cluster those features:

For each of these images, I am running the predict() function of Keras with the VGG16 model. Because I excluded the last layers of the model, this function will not actually return any class predictions as it would normally do; instead we will get the output of the last layer: block5_pool (MaxPooling2D).

These, we can use as learned features (or abstractions) of the images. Running this part of the code takes several minutes, so I save the output to a RData file (because I samples randomly, the classes you see below might not be the same as in the sample_fruits list above).

Read the whole thing.

Comments closed

Using plm To Analyze Panel Data

Published 2018-10-09 by Kevin Feasel

Michael Grogan shows us how to use the plm package to perform linear regression against panel data:

Types of data

Cross-Sectional: Data collected at one particular point in time

Time Series: Data collected across several time periods

Panel Data: A mixture of both cross-sectional and time series data, i.e. collected at a particular point in time and across several time periods

Fixed Effects: Effects that are independent of random disturbances, e.g. observations independent of time.

Random Effects: Effects that include random disturbances.

Let us see how we can use the plm library in R to account for fixed and random effects. There is a video tutorial link at the end of the post.

Read on for an example.

Comments closed

Analyzing Update Dates For R Packages

Published 2018-10-08 by Kevin Feasel

Tomaz Kastrun takes a look at CRAN package update dates:

So more updates are coming in autumn times. But the results of correlation:
cor(dd_ym2010)[2,3]
is still just 0.155, making it hard to draw any concrete conclusions. Adding year 2018 will skew the picture and add several outliers, as the fact that year 2018 is still a running year (as of writing this blog post).

Read on for a descriptive analysis of this data set.

Comments closed

Multi-Class Classification With vtreat

Published 2018-10-04 by Kevin Feasel

John Mount has an example of using the vtreat package for multi-class classification in R:

vtreat is a powerful R package for preparing messy real-world data for machine learning. We have further extended the package with a number of features including rquery/rqdatatable integration (allowing vtreat application at scale on Apache Spark or data.table!).

In addition vtreat and can now effectively prepare data for multi-class classification or multinomial modeling.

The two functions needed (mkCrossFrameMExperiment() and the S3 method prepare.multinomial_plan()) are now part of vtreat.

Click through for an example of this in action.

Comments closed

Packages And Functions In R

Published 2018-10-02 by Kevin Feasel

Ellen Talbot walks us through some of the basics of using packages and functions in R:

A function does some computation on an object. The use of a function consists of:

A function’s name

Parentheses

0 or more inputs

Each input is provided to an argument or parameter within a function.

These arguments have names, although you don’t often need to provide the names.

You can find out what arguments a function takes by using the code completion and it’s help snippet, or by searching for the function in the Rstudio Help tab.

When you’re inside the brackets of a function you can get the list of available arguments and auto-complete them.

Ellen also includes some useful R libraries for working with and visualizing data.

Comments closed

Visualizing Stock Data With Lares

Published 2018-09-28 by Kevin Feasel

Bernando Lares shows off some stuff his lares can do around visualizing time series data:

The overall idea of these functions is to visualize your stocks and portfolio’s performance with a just a few lines of simple code. I’ve created individual functions for each of the calculations and plots, and some other functions that gather all of them into a single list of objects for further use.

On the other hand, the lares package is “my personal library used to automate and speed my everyday work on Analysis and Machine Learning tasks”. I am more than happy to share it with you for your personal use. Feel free to install, use, and comment on any of its code and functionalities and I’ll happy to help you with it. I have previously shared other uses of the library in other posts which might also interest you: Visualizing ML Results (binary), Visualizing ML Results (continuous)and AutoML to understand datasets.

NOTE 1: The following post was written by a non-economist or professional investor. I am open to your comments and technical corrections if needed. Glad to learn as always!
NOTE 2: I will be using the less customizable functions in this post so we can focus more on the outputs than in the coding part; but once again, feel free to use the functions and dive into the library to understand or change them!
NOTE 3: All currency units are USD ($).

It does seem to be easy to use for this scenario.

Comments closed

What’s New With Machine Learning Services

Published 2018-09-28 by Kevin Feasel

Niels Berglund looks at SQL Server 2019’s Machine Learning Services offering for updates:

So, when I read What’s new in SQL Server 2019, I came across a lot of interesting “stuff”, but one thing that stood out was Java language programmability extensions. In essence, it allows us to execute Java code in SQL Server by using a pre-built Java language extension! The way it works is as with R and Python; the code executes outside of the SQL Server engine, and you use sp_execute_external_script as the entry-point.

I haven’t had time to execute any Java code as of yet, but in the coming days, I definitely will drill into this. Something I noticed is that the architecture for SQL Server Machine Learning Services has changed (or had additions to it).

That Java support is for Spark, I’d imagine. And I hope they allow for Scala.

Comments closed

Rayshader: 3D Surface Plotting In R

Published 2018-09-27 by Kevin Feasel

David Smith looks at an interesting package in R:

Tyler describes the rayshader package in a gorgeous blog post: his goal was to generate 3-D representations of landscape data that “looked like a paper weight”. (Incidentally, you can use this package to produce actual paper weights with 3-D printing.) To this end, he went beyond simply visualizing a 3-D surface in rgl and added a rectangular “base” to the surface as well as shadows cast by the geographic features. He also added support for detecting (or specifying) a water level: useful for representing lakes or oceans (like the map of the Monterey submarine canyon shown below) and for visualizing the effect of changing water levels like this animation of draining Lake Mead.

It looks great.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31