Press "Enter" to skip to content

Category: R

Image Classification with Keras and TensorFlow 2 in R

Shirin Glander takes us through the task of image classification using TensorFlow version 2.2.0:

Recently, I have been getting a few comments on my old article on image classification with Keras, saying that they are getting errors with the code. And I have also gotten a few questions about how to use a Keras model to predict on new images (of different size). Instead of replying to them all individually, I decided to write this updated version using recent Keras and TensorFlow versions (all package versions and system information can be found at the bottom of this article, as usual).

Click through for the R code.

Comments closed

R: Avoid apply() with Large Datasets

George Pipis walks us through the performance cost of using the apply() function against a large dataset in R:

When we are dealing with large datasets and there is a need to calculate some values like the row/column min/max/rank/mean etc we should avoid the apply function because it takes a lot of time. Instead, we can use the matrixStats package and its corresponding functions. Let’s provide some comparisons.

Click through for a demonstration of how much slower it can be in certain cases. H/T R-Bloggers.

Comments closed

Filtering with dplyr Using Strings as Expressions

Kenneth Tay shows how to build arbitrary expressions to execute in dplyr’s filter function:

This took me a while to figure out and so I thought I would post this as future reference. Let’s say I have the mtcars data and I want to filter for just the rows with cyl == 6. I would do something like this:

library(tidyverse)
data(mtcars)
mtcars %>% filter(cyl == 6)

What if I had the filter condition as a string instead?

Read on to see how you can do this. Given the answer, I wonder if there’s any chance that could be turned into an injection vulnerability. H/T R-Bloggers

Comments closed

The State of R Packages Locally

Maelle Salmon and Gabor Csardi walk us through some details about installed packages on a system:

Now how do you know where any of your installed packages was installed? You can use find.package() and path.package()!

To check whether a package is installed, it is better to use find.package() than installed.packages() because the latter, as its docs state, can be slow on some systems. In both cases, it does not mean the package is usable, for that you’d need to use library() or require().

Read on for several tips around where packages are located, what their contents look like, and learning a bit more about the actual code in packages.

Comments closed

Using oysteR to Track Security Vulnerabilities in R Packages

Colin Gillespie walks us through using the oysteR package:

The {oysteR} package is an R interface to the OSS Index that allows users to scan their installed R packages. A few months ago, I stumbled across a fledgeling version of this package and decided to make a few contributions to help move the package from GitHub to CRAN. A few PRs later, I’m now a co-author and the package is on CRAN.

Click through for a demo.

Comments closed

MLOPS in R with GitHub Actions

David Smith explains MLOPS and GitHub actions in a talk:

In the talk, I demonstrate the process in action (the demo starts at the 14:30 mark in the video below). I used Visual Studio Code to edit the app.R file in repository, and then pushed the changes to GitHub. That immediately triggered the action to deploy the updated file via SSH to the Shiny Server, running in a remote VM. Similarly, changes to the data file or to the R script files implementing the logistic regression model would trigger the model to be retrained in the cluster, and re-deploy the endpoint to deliver new predictions from the updated model.

Click through for a quick summary, link to the repo, and embedded video of the talk.

Comments closed

Explaining the ROC Plot

Nina Zumel takes us through what each element of a ROC curve means:

In our data science teaching, we present the ROC plot (and the area under the curve of the plot, or AUC) as a useful tool for evaluating score-based classifier models, as well as for comparing multiple such models. The ROC is informative and useful, but it’s also perhaps overly concise for a beginner. This leads to a lot of questions from the students: what does the ROC tell us about a model? Why is a bigger AUC better? What does it all mean?

Read on for the answer.

Comments closed

Fun with Benford’s Law

Nagdev Amruthnath covers a topic which brings me joy:

Benford’s Law is one of the most underrated and widely used techniques that are commonly used in various applications. United States IRS neither confirms nor denies their use of Benford’s law to detect any number of manipulations in income tax filing. Across the Atlantic, the EU is very open and proudly claims its use of Benford’s law. Today, this is widely used in accounting to detect any fraud. Nigrini, a professor at the University of Cape Town, also used this law to identify financial discrepancies in Enron’s financial statement. In another case, Jennifer Golbeck, a professor at the University of Maryland, was able to identify bot accounts on twitter using Benford’s law. Xiaoyu Wang from the University of Winnipeg even published a report on how to use Benford’s law on images. In the rest of this article, we will take about Benford’s law and how it can be applied using R.

The applications to images and music were new to me. Very cool. H/T R-Bloggers

Comments closed