R – Page 60 – Curated SQL

Counting Open Lockers in R

Published 2021-02-05 by Kevin Feasel

Holger von Jouanne-Diedrich solves a riddle:

We are standing in front of 100 lockers arranged side by side, all of which are closed. One man has a bunch of keys with all 100 keys and will pass the lockers exactly a hundred times, opening or closing some of them.
On the first pass, he opens all the lockers. On the second pass, the man will go to every other locker and change its state. That means: If it is closed, it will be opened. If it is already open, it will be closed. In this case, he closes lockers 2, 4, 6… 98 and 100, because all doors were open before.
On the third pass, he changes the state of every third locker – that is, 3, 6, 9, … 96, 99. Closed doors are opened, open doors closed. In the fourth pass, every fourth locker is changed, at the fifth every fifth – and so on. At the last, the 100th, the man finally only changes the state of door number 100.
The question is: How many of the 100 compartments are open after the 100th pass?

Click through for one solution in R.

Comments closed

Sunflower Plots in R

Published 2021-02-05 by Kevin Feasel

Kenneth Tay takes a look at a sunflower plot:

A sunflower plot is a type of scatterplot which tries to reduce overplotting. When there are multiple points that have the same (x, y) values, sunflower plots plot just one point there, but has little edges (or “petals”) coming out from the point to indicate how many points are really there.

My first thought on it is that it’s too busy and doesn’t do its job of portraying a mass of data points very well. When you have just a few observations, then yeah, it’s not too bad. But once you have any reasonable amount of density on the plot, it’s better to use jitter and transparency (as Kenneth points out). H/T R-bloggers

Comments closed

Reporting on Correlation Analysis in R

Published 2021-02-02 by Kevin Feasel

Petr Baranovskiy continues a series on correlation analysis using R:

This is the second part of the Correlation Analysis in R series. In this post, I will provide an overview of some of the packages and functions used to perform correlation analysis in R, and will then address reporting and visualizing correlations as text, tables, and correlation matrices in online and print publications.

Read the whole thing.

Comments closed

Model Post-Processing with insight

Published 2021-02-01 by Kevin Feasel

The easystats team talks about the insight package in R:

We are talking about the insight package. It is what allows other packages, like easystats (parameters, effectsize, performance, report, …) or ggstatsplot, sjstats or modelsummary to be as powerful as they are, supporting tons of different R models. So why make you life hard when you can be like them, and rely on insight?
It is made for developers (and users) that do some postprocessing of different models (e.g., extracting stuff like parameters, values, data, names, specifications, predictions, priors, etc.), whether it is to nicely display their results or to do further computation.

Click through for an example of what it does and how it works. H/T R-bloggers

Comments closed

Determining a Good Test Set Size

Published 2021-02-01 by Kevin Feasel

John Mount thinks about test set size:

In this note we will answer “what is a good test set size?” three ways.
– The usual practical answer.
– A decision theory answer.
– A novel variational answer.
Each of these answers is a bit different, as they are solved in slightly different assumed contexts and optimizing different objectives. Knowing all 3 solutions gives us some perspective on the problem.

My rule of thumb is that I want it to be as small as possible while containing the highest likelihood of hitting all real-world scenarios enough times to provide a valid comparison. This conversely maximizes the size of the training data set, giving us the best chance of seeing the widest variety of scenarios we can during the formative phase.

And as usual, John goes way deeper than my rules of thumb. I like this post a lot.

Comments closed

Using OAuth 2 in R Packages

Published 2021-01-25 by Kevin Feasel

Maelle Salmon explains how OAuth 2 works and also how you can use it in R packages:

When writing an R package wrapping an API using OAuth 2.0 you’ll need the user to grant access to an “app”, which will allow to create an access token and a refresh token. The access token will then often be passed to the API in a header when making requests, whilst the refresh token would be posted in a query string when the access token needs to be renewed.
Your problem is: how do I imitate a third-party app? Thankfully for you, in most cases the complexity can be handled by the httr package. For other cases, or if you want to e.g. only use curl, you will have to get creative.

Read on for more detail.

Comments closed

AzureCosmosR

Published 2021-01-22 by Kevin Feasel

Hong Ooi takes us through an R library for working with Cosmos DB:

Among other features, Azure Cosmos DB is notable in that it supports multiple data models and APIs. When you create a new Cosmos DB account, you specify which API you want to use: SQL/core API, which lets you use a dialect of T-SQL to query and manage tables and documents; MongoDB; Azure table storage; Cassandra; or Gremlin (graph). AzureCosmosR provides a comprehensive interface to the SQL API, as well as bridges to the MongoDB and table storage APIs. On the Resource Manager side, AzureCosmosR extends the AzureRMR class framework to allow creating and managing Cosmos DB accounts.
AzureCosmosR is now available on CRAN. You can also install the development version from GitHub, with devtools::install_github("Azure/AzureCosmosR").

Hong provides examples for us using three of the Cosmos DB APIs, so check it out.

Comments closed

Gradient Descent in R

Published 2021-01-20 by Kevin Feasel

Holger von Jouanne-Diedrich lays out the basics of gradient descent:

Gradient Descent is a mathematical algorithm to optimize functions, i.e. finding their minima or maxima. In Machine Learning it is used to minimize the cost function of many learning algorithms, e.g. artificial neural networks a.k.a. deep learning. The cost function simply is the function that measures how good a set of predictions is compared to the actual values (e.g. in regression problems).
The gradient (technically the negative gradient) is the direction of steepest descent. Just imagine a skier standing on top of a hill: the direction which points into the direction of steepest descent is the gradient!

Click through for an example in R.

Comments closed

Updates to AzureR

Published 2021-01-14 by Kevin Feasel

Hong Ooi has some updates for us:

This is an update on what’s been happening with the AzureR suite of packages. First, you may have noticed that just before the holiday season, the packages were updated on CRAN to change their maintainer email to a non-Microsoft address. This is because I’ve left Microsoft for a role at Westpac bank here in Australia; while I’m sad to be leaving, I do intend to continue maintaining and updating the packages.
To that end, here are the changes that have recently been submitted to CRAN, or will be shortly:

Read on for the changes. This includes a new package to work with Cosmos DB from R.

Comments closed

Countdown Number Puzzle

Published 2021-01-13 by Kevin Feasel

Tomaz Kastrun has a fun puzzle for us:

So the game is (was) known as a TV show where then host would give a random 3-digit number and the contestants would draw 6 random numbers from stack of numbers. Given the time limit, the winner was the one who would create a formula matching the result or being closest.
Many ways, tips, tricks and optimisations were already considered, maybe the most famous was the Reverse Polish notation where operators follow their operands and is a great fit for the game.
With useless functionality, I have decided to use permuteGeneral function from RcppAlgos or same functionality could be achieved with combn function.

Click through to see it in action.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Category: R