R – Page 65 – Curated SQL

When we are dealing with large datasets and there is a need to calculate some values like the row/column min/max/rank/mean etc we should avoid the apply function because it takes a lot of time. Instead, we can use the matrixStats package and its corresponding functions. Let’s provide some comparisons.

Click through for a demonstration of how much slower it can be in certain cases. H/T R-Bloggers.

Comments closed

Filtering with dplyr Using Strings as Expressions

Published 2020-09-04 by Kevin Feasel

Kenneth Tay shows how to build arbitrary expressions to execute in dplyr’s filter function:

This took me a while to figure out and so I thought I would post this as future reference. Let’s say I have the mtcars data and I want to filter for just the rows with cyl == 6. I would do something like this:
library(tidyverse)
data(mtcars)
mtcars %>% filter(cyl == 6)
What if I had the filter condition as a string instead?

Read on to see how you can do this. Given the answer, I wonder if there’s any chance that could be turned into an injection vulnerability. H/T R-Bloggers

Comments closed

The State of R Packages Locally

Published 2020-09-04 by Kevin Feasel

Maelle Salmon and Gabor Csardi walk us through some details about installed packages on a system:

Now how do you know where any of your installed packages was installed? You can use find.package() and path.package()!
To check whether a package is installed, it is better to use find.package() than installed.packages() because the latter, as its docs state, can be slow on some systems. In both cases, it does not mean the package is usable, for that you’d need to use library() or require().

Read on for several tips around where packages are located, what their contents look like, and learning a bit more about the actual code in packages.

Comments closed

Error Handling in R with purrr

Published 2020-09-02 by Kevin Feasel

Ariel Muldoon shows off a couple of helpful functions in R’s purrr library:

The possibly() function is a wrapper function. It wraps around an existing function. Other than defining the function to wrap, the main argument of interest is otherwise. In otherwise we define what value to return if we get an error from the function we are wrapping.

Read the whole thing. H/T R-Bloggers.

Comments closed

Using oysteR to Track Security Vulnerabilities in R Packages

Published 2020-09-01 by Kevin Feasel

Colin Gillespie walks us through using the oysteR package:

The {oysteR} package is an R interface to the OSS Index that allows users to scan their installed R packages. A few months ago, I stumbled across a fledgeling version of this package and decided to make a few contributions to help move the package from GitHub to CRAN. A few PRs later, I’m now a co-author and the package is on CRAN.

Click through for a demo.

Comments closed

MLOPS in R with GitHub Actions

Published 2020-08-26 by Kevin Feasel

David Smith explains MLOPS and GitHub actions in a talk:

In the talk, I demonstrate the process in action (the demo starts at the 14:30 mark in the video below). I used Visual Studio Code to edit the app.R file in repository, and then pushed the changes to GitHub. That immediately triggered the action to deploy the updated file via SSH to the Shiny Server, running in a remote VM. Similarly, changes to the data file or to the R script files implementing the logistic regression model would trigger the model to be retrained in the cluster, and re-deploy the endpoint to deliver new predictions from the updated model.

Click through for a quick summary, link to the repo, and embedded video of the talk.

Comments closed

Creating Hand-Drawn Networks in R

Published 2020-08-25 by Kevin Feasel

David Schoch announces a new R package:

This post introduces the R package rougnet, which wraps the java script library rough.js to draw sketchy, hand-drawn-like networks. From all my network packages, this is probably the most useless but most fun one I developed.

Click through to check it out. H/T R-bloggers.

Comments closed

Explaining the ROC Plot

Published 2020-08-25 by Kevin Feasel

Nina Zumel takes us through what each element of a ROC curve means:

In our data science teaching, we present the ROC plot (and the area under the curve of the plot, or AUC) as a useful tool for evaluating score-based classifier models, as well as for comparing multiple such models. The ROC is informative and useful, but it’s also perhaps overly concise for a beginner. This leads to a lot of questions from the students: what does the ROC tell us about a model? Why is a bigger AUC better? What does it all mean?

Read on for the answer.

Comments closed

Fun with Benford’s Law

Published 2020-08-19 by Kevin Feasel

Nagdev Amruthnath covers a topic which brings me joy:

Benford’s Law is one of the most underrated and widely used techniques that are commonly used in various applications. United States IRS neither confirms nor denies their use of Benford’s law to detect any number of manipulations in income tax filing. Across the Atlantic, the EU is very open and proudly claims its use of Benford’s law. Today, this is widely used in accounting to detect any fraud. Nigrini, a professor at the University of Cape Town, also used this law to identify financial discrepancies in Enron’s financial statement. In another case, Jennifer Golbeck, a professor at the University of Maryland, was able to identify bot accounts on twitter using Benford’s law. Xiaoyu Wang from the University of Winnipeg even published a report on how to use Benford’s law on images. In the rest of this article, we will take about Benford’s law and how it can be applied using R.

The applications to images and music were new to me. Very cool. H/T R-Bloggers

Comments closed

Covariance and Multicollinearity

Published 2020-08-14 by Kevin Feasel

Mattan Ben-Shachar gives us an intuitive understanding of multicollinearity and how it can affect an analysis:

The common and almost default approach is to fix age to a constant. This is really what our model does in the first place: the coefficient of height represents the expected change in weight while age is fixed and not allowed to vary. What constant? A natural candidate (and indeed emmeans’ default) is the mean. In our case, the mean age is 14.9 years. So the expected values produced above are for three 14.9 year olds with different heights. But is this data plausible? If I told you I saw a person who was 120cm tall, would you also assume they were 14.9 years old?
No, you would not. And that is exactly what covariance and multicollinearity mean – that some combinations of predictors are more likely than others.

I liked the explanation Mattan provides us. Also be sure to read the warnings near the end of the post around other things to try. H/T R-bloggers

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Category: R

R: Avoid apply() with Large Datasets

Filtering with dplyr Using Strings as Expressions

The State of R Packages Locally

Error Handling in R with purrr

Using oysteR to Track Security Vulnerabilities in R Packages

MLOPS in R with GitHub Actions

Creating Hand-Drawn Networks in R

Explaining the ROC Plot

Fun with Benford’s Law

Covariance and Multicollinearity