R – Page 53 – Curated SQL

When not
– you are always using GLM models. (they are very flexible!) it makes no sense to me to go for the extra {parsnip} layer if you are always using the same models. You could still consider using recipes to feature engineer.
– If you are familiar with the kind of data and what models will work on that data. Basically you are an expert on this field and have worked on it for many years. There is no need to experiment.

Read on for concrete examples of when it does make sense. H/T R-Bloggers.

Comments closed

Parallelizing R Code

Published 2021-07-02 by Kevin Feasel

Mira Celine Klein walks us through some of the basics of parallel code execution in R:

In many cases, your code fulfills multiple independent tasks, for example, if you do a simulation with five different parameter sets. The five processes don’t need to communicate with each other, and they don’t need any result from any other process. They could even be run simultaneously on five different computers… or processor cores. This is called parallelization. Modern desktop computers usually have 16 or more processor cores. To find out how many cores you have on your PC, use the function detectCores(). By default, R uses only one core, but this article tells you how to use multiple cores. If your simulation needs 20 hours to complete with one core, you may get your results within four hours thanks to parallelization!

Read on to see how you can accomplish this, but note that it is operating system-dependent.

Comments closed

Random Forest Feature Importance

Published 2021-07-02 by Kevin Feasel

Selcuk Disci takes us through an important concept with random forest models:

The random forest algorithms average these results; that is, it reduces the variation by training the different parts of the train set. This increases the performance of the final model, although this situation creates a small increase in bias.
The random forest uses bootstrap aggregating(bagging) algortihms. We would take for training sample, X = x₁, …, x_n and, Y = y₁, …, y_n for the outputs. The bagging process repeated B times with selecting a random sample by changing the training set and, tries to fit the relevant tree algorithms to the samples. This fitting function is denoted f_b in the below formula.

As far as the article goes, inflation is always and everywhere a monetary phenomenon. H/T R-Bloggers.

Comments closed

The Future of R with SQL Server

Published 2021-07-01 by Kevin Feasel

James Rowland-Jones has an update for us:

The importance of R was first recognized by the SQL Server team back in 2016 with the launch of SQL ML Services and R Server. Over the years we have added Python to SQL ML Services in 2017 and Java support through our language extensions in 2019. Earlier this year we also announced the general availability of SQL ML Services into Azure SQL Managed Instance. SparkR, sparklyr, and PySpark are also available as part of SQL Server Big Data Clusters. We remain committed to R.
With that said, much has changed in the world of data science and analytics since 2016. Microsoft’s approach to open-source software has undergone a similar transformation in the same period. It is therefore time for us to share how we, in Azure SQL and SQL Server, are changing to meet the needs of our users and the R community moving forward.

I never used ML Server (but have used SQL Server ML Services a lot), so that part of the announcement doesn’t affect me and I’m not sure how many organizations it does affect. Switching to CRAN R is a good idea and I appreciate that they’re open-sourcing the RevoScaleR and revoscalepy code bases. The one thing I’d really like to see in vNext’s Machine Learning Services is an easy way to update the version of R

1 Comment

Using ggplot2 to Create a Faceted Histogram plus Curve

Published 2021-06-28 by Kevin Feasel

Sebastian Sauer builds a combo chart:

Overlaying a histogram (possibly facetted) is not something far fetched when analyzing data. Surprisingly, it appears (to the best of my knowledge) that there’s no comfortable out-of-the-box solution in ggplot2, although it can be of course achieved with some lines of code. Here’s my take.

Click through for Sebastian’s version, as well as information on the ggh4x library.

Comments closed

8 Ways to Solve a Problem in R

Published 2021-06-24 by Kevin Feasel

Holger von Jouanne-Diedrich shows how many ways there are to solve a problem of squares:

This time we want to solve the following simple task with R: Take the numbers 1 to 100, square them, and add all the even numbers while subtracting the odd ones!
If you want to see how to do that in at least seven different ways in R, read on!
There are many different solutions possible, making use of several aspects of the R language. So this blog post can be seen as a fun exercise to recap some of the concepts explained in our introduction to R: Learning R: The Ultimate Introduction (incl. Machine Learning!).

Give it a try and then check out the variety of solutions.

Comments closed

From SQL Server to Excel via R

Published 2021-06-23 by Kevin Feasel

Kevin Wilkie wraps up a series on data movement between Excel and SQL Server:

In today’s post, we’ll go over how to export the data you have in SQL Server to Excel via one of my favorite computer languages – R. (Since we did have a post on how to Import data, it would seem rather rude not to have one on how to Export data.)
As always, you’ll need to open your R tool of choice. I tend to use RStudio but there are several out there that will accomplish this same goal.

Click through to see how.

Comments closed

Reinvestment Risk and Yield to Maturity

Published 2021-06-18 by Kevin Feasel

Sang-Heon Lee looks at reinvestment risk:

From this post, we can learn the reinvestment risk of coupon bond. It is worth noting that 1) YTM is attainable when roll rate is the same as YTM and 2) The argument that coupon rate is equal to YTM at issuance (par yield) is only applied to standard coupon bond with in arrears interest payment schedule. Unlike standard coupon bond, coupon bond with in advance interest payment has a higher YTM than coupon rate at an issuance.

Click through for the explanation as well as the R code used. H/T R-Bloggers.

Comments closed

Building QQ plots in R

Published 2021-06-16 by Kevin Feasel

The folks at finnstats explain the notion of a Quantile-Quantile plot and show how to create one in R:

QQ-plots in R, first need to understand the Q-Q plot. The Q-Q plot is a graphical tool to help us examine if a set of data plausibly came from some theoretical distribution such as a Normal or not.
Suppose, if we are executing a statistical analysis the test comes under parametric methods assumes variable is Normally distributed, we can make use of a Q-Q plot to check that assumption.
It’s just a visual verification, not full proof, so we can make use of some other statistical test also. But Q-Qplot allows us to see at-a-glance if our assumption is valid or not.

Click through to learn more. H/T R-bloggers.

Comments closed

Building a Payoff Diagram in R

Published 2021-06-15 by Kevin Feasel

Holger von Jouanne-Diedrich builds out payoff diagrams:

Not many people understand the financial alchemy of modern financial investment vehicles, like hedge funds, that often use sophisticated trading strategies. But everybody understands the meaning of rising and falling markets. Why not simply translate one into the other?
If you want to get your hands on a simple R script that creates an easy-to-understand plot (a profit & loss profile or payoff diagram) out of any price series, read on!

Click through for several examples of code and financial instruments.

Comments closed

Category: R

Reasons to Use Tidymodels

Parallelizing R Code

Random Forest Feature Importance

The Future of R with SQL Server

Using ggplot2 to Create a Faceted Histogram plus Curve

8 Ways to Solve a Problem in R

From SQL Server to Excel via R

Reinvestment Risk and Yield to Maturity

Building QQ plots in R

Building a Payoff Diagram in R