R – Page 70 – Curated SQL

Explaining Black Box Models with LIME

Published 2020-01-27 by Kevin Feasel

Holger von Jouanne-Diedrich takes us through the intuition of LIME:

There is a new hot area of research to make black-box models interpretable, called Explainable Artificial Intelligence (XAI), if you want to gain some intuition on one such approach (called LIME), read on!
Before we dive right into it it is important to point out when and why you would need interpretability of an AI. While it might be a desirable goal in itself it is not necessary in many fields, at least not for users of an AI, e.g. with text translation, character and speech recognition it is not that important why they do what they do but simply that they work.
In other areas, like medical applications (determining whether tissue is malignant), financial applications (granting a loan to a customer) or applications in the criminal-justice system (gauging the risk of recidivism) it is of the utmost importance (and sometimes even required by law) to know why the machine arrived at its conclusions.
One approach to make AI models explainable is called LIME for Local Interpretable Model-Agnostic Explanations. There is already a lot in this name!

LIME is not trivial to use and it can be very slow, but it is a great way to visualize models.

Comments closed

Generating Synthetic Data with R

Published 2020-01-24 by Kevin Feasel

Sidharth Macherla uses the conjurer package in R to generate synthetic data:

If you are building data science applications and need some data to demonstrate the prototype to a potential client, you will most likely need synthetic data. In this article, we discuss the steps to generating synthetic data using the R package ‘conjurer’.

One of the toughest problems of generating data is making it look realistic enough. It’s one level of difficulty to build “steady-state” data, but if you want data to follow a combination of trend and random walk…that’s when things get dicey. H/T R-Bloggers

Comments closed

Concepts in Support Vector Machines

Published 2020-01-23 by Kevin Feasel

Abhijit Telang takes us through the calculations involved in Support Vector Machines and then gives us an example in R:

So, let’s take that out and we are back to old, classical vector algebra. It’s like a person with a bunch of sticks to figure out which one to lay where in a 2-D plane to separate one class of objects from another, provided class definitions are already known.
The problem is which particular shape and length must be chosen to show maximum contrast between classes.
We need to arrive at a function definition, in such a way that the value a given function takes changes drastically (e.g. from a large positive value to a large negative value).

SVM is often great for two-class classification problems, and different variants also work well for multi-class problems.

Comments closed

Customizing Your Rprofile

Published 2020-01-22 by Kevin Feasel

Colin Gillespie shows how you can customize R via the .Rprofile file:

Every time R starts, it runs through a couple of R scripts. One of these scripts is the .Rprofile. This allows users to customise their particular set-up. However, some care has to be taken, as if this script is broken, this can cause R to break. If this happens, just delete the script!
Full details of how the .Rprofile works can be found in my book with Robin on Efficient R programming. However, roughly R will look for a file called .Rprofile first in your current working directory, then in your home area. Crucially, it will only load the first file found. This means you can have per project Rprofile.

Click through for a sample R profile which has a lot going on.

Comments closed

Simulating Feller’s Coin-Tossing Puzzle in R

Published 2020-01-22 by Kevin Feasel

David Robinson has another fun puzzle:

Mathematician William Feller posed the following problem:
If you flip a coin times, what is the probability there are no streaks of heads in a row?
Note that while the number of heads in a sequence is governed by the binomial distribution, the presence of consecutive heads is a bit more complicated, because the presence of a streak at various points in the sequence isn’t independent

Click through for a solution in R.

Comments closed

Cluster-Based Image Analysis and Reduction

Published 2020-01-16 by Kevin Feasel

Sebastian Sauer takes an image and reduces it to a group of colors:

This post is a remake of this casestudy: https://fallstudien.netlify.com/fallstudie_bildanalyse/bildanalyse
brought to you by Karsten Lübke.
The main purpose is to replace the base R command that Karsten used with a more tidyverse-friendly style. I think that’s easier (for me).
We will compute a cluster analysis to find the typical RGB color per cluster.

Click through for quite a bit of R code and a couple interesting turns.

Comments closed

Solving the Spelling Bee Honeycomb Puzzle

Published 2020-01-16 by Kevin Feasel

David Robinson has fun with puzzle-solving:

Solving this puzzle in R is interesting enough, but it’s particularly challenging to do so in a computationally efficient way. As much as I love the tidyverse, this, like the “lost boarding pass” puzzle and Emily Robinson’s evaluation of the best Pokémon team, serves as a great example of using R’s matrix operations to work efficiently with data.
I’ve done a lot of puzzles recently, and I realized that showing the end result isn’t a representation of my thought process. I don’t show all the dead ends and bugs, or explain why I ended up choosing a particular path. So in the same spirit as my Tidy Tuesday screencasts, I recorded myself solving this puzzle (though not the process of turning it into a blog post).

Most of the post is analysis around the problem, but you do get a viable solution as well.

Comments closed

Solving Sudoku with R

Published 2020-01-14 by Kevin Feasel

Tomaz Kastrun builds a validation function for Sudoku:

Function validater will validate for the sudoku board a particular solution at a particular position:
validater(sudoku, 1, c(1,4))
In matrix, at position x=1, y=4, where there is 0, it will test if number 1 is valid or not. If the number is valid, it returns TRUE (number) to outer function for finding complete solution.
This function iterates through all the possible 0-positions and iterates through solutions that are still available based on the rules:

Click through for that validation function.

Comments closed

Working with R and the Windows Command Line

Published 2020-01-10 by Kevin Feasel

Tomaz Kastrun takes us through calling CMD commands from R:

From time to time, when developing in R, working and wrangling data , preparing for machine learning projects, it comes the time, one would still need to access the operating system commands from/in R.
In this blog post, let’s take a look at some most useful cmd commands when using R. Please note, that the cmd commands apply only to windows environment, for Linux/MacOS, the system commands should be slightly changed, but the wrapper R code should remains the same.

The need does come up, so it’s good to have that knowledge at hand.

Comments closed

Mapping in R with mapply

Published 2019-12-30 by Kevin Feasel

Andrew Treadway shows how to use the Map() and mapply() functions in R:

An older post on this blog talked about several alternative base apply functions. This post will talk about how to apply a function across multiple vectors or lists with Map and mapply in R. These functions are generalizations of sapply and lapply, which allow you to more easily loop over multiple vectors or lists simultaneously.

The idea of Map in functional programming takes a bit of time to really wrap your head around, but once you do, it becomes extremely powerful. H/T R-bloggers

Comments closed

Category: R