R – Page 69 – Curated SQL

Computing a Z Score with R

Published 2020-02-18 by Kevin Feasel

Anisa Dhana shows us a quick example of how to calculate Z score with R:

In short, the z-score is a measure that shows how much away (below or above) of the mean is a specific value (individual) in a given dataset. In the example below, I am going to measure the z value of body mass index (BMI) in a dataset from NHANES.

Because R is a set-oriented, functional programming language, the answer is quite simple.

Comments closed

Matrix Inputs for Shiny Apps

Published 2020-02-18 by Kevin Feasel

Andreas Neudecker shows off a new package:

We have been developing shiny apps for quite some years now. A problem we stumbled upon multiple times in this process was, that there is no easy approach to define matrices in shiny. So we had to help ourselves with workarounds.

Not anymore. Now you can plop a matrix right onto your Shiny app. And the package is on CRAN.

Comments closed

Publishable Adverse Event Tables in R

Published 2020-02-13 by Kevin Feasel

Inge Christoffer Olsen shows how to clean up tables in R for publication:

The summary of Adverse Events is a nice table just summing up the adverse events in the trial. Note the “[N] n (%)”-format which is the number of events, number of patients with events and percentage of patients with event.

This particular example is about adverse events, but the key concepts in the code apply to many kinds of tables you want to make look a bit nicer. H/T R-Bloggers

Comments closed

Monitoring for Distribution Changes

Published 2020-02-13 by Kevin Feasel

Nina Zumel explains how we can track if something has changed by monitoring its distribution:

A client recently came to us with a question: what’s a good way to monitor data or model output for changes? That is, how can you tell if new data is distributed differently from previous data, or if the distribution of scores returned by a model have changed? This client, like many others who have faced the same problem, simply checked whether the mean and standard deviation of the data had changed more than some amount, where the threshold value they checked against was selected in a more or less ad-hoc manner. But they were curious whether there was some other, perhaps more principled way, to check for a change in distribution.

The answer is, of course, that there is. Click through to see a few of the techniques.

Comments closed

Changes in the R foreach Package

Published 2020-02-12 by Kevin Feasel

Hong Ooi announces some changes to the foreach package in R:

This post is to announce some new and upcoming changes in the foreach package.
First, foreach can now be found on GitHub! The repository is at https://github.com/RevolutionAnalytics/foreach, replacing its old home on R-Forge. Right now the repo hosts both the foreach and iterators packages, but that may change later.

There are also some changes to the package itself, so read on for those.

Comments closed

Pulling R Packages from Fedora

Published 2020-02-11 by Kevin Feasel

Inaki Ucar has an interesting project:

Bringing R packages to Fedora (in fact, to any distro) is an Herculean task, especially considering the rate at which CRAN grows nowadays. So I am happy to announce the cran2copr project, which is an attempt to maintain binary RPM repos for most of CRAN (~15k packages as of Feb. 2020) in an automated way using Fedora Copr.

Click through for installation instructions if you’re using an RPM-based Linux distribution like Fedora or CentOS. H/T R-Bloggers.

Comments closed

Calculating Distances in R

Published 2020-02-10 by Kevin Feasel

Chris Brown gives us three ways to calculate distance in R:

Calculating a distance on a map sounds straightforward, but it can be confusing how many different ways there are to do this in R.
This complexity arises because there are different ways of defining ‘distance’ on the Earth’s surface.
The Earth is spherical. So do you want to calculate distances around the sphere (‘great circle distances’) or distances on a map (‘Euclidean distances’).
Then there are barriers. For example, for distances in the ocean, we often want to know the nearest distance around islands.
Then there is the added complexity of the different spatial data types. Here we will just look at points, but these same concepts apply to other data types, like shapes.

Read on to learn these three separate techniques. H/T R-Bloggers.

Comments closed

Check Those R Repos

Published 2020-02-06 by Kevin Feasel

John Mount has a public service announcement:

In a lot of our R writing we casually say “install from CRAN using install.packages('PKGNAME')” or “update your packages by using update.packages(ask = FALSE, checkBuilt = TRUE) (and answering ‘no’ to all questions about compiling).”
We recently became aware that for some users this isn’t complete advice.
The above depends on your R install pointing to a repository that is in fact up to date. To check what repositories you are using please use the command options('repos').

The specific example here is around the Microsoft R Archive Network (MRAN), which stays at fixed dates. This is for a good reason: because it helps companies standardize on a known set of versions of R packages by default. That way you don’t have version 1.8 of a package in dev and then get 1.9 in production and find out that something broke between the two versions.

Comments closed

Audio Analysis in R

Published 2020-02-04 by Kevin Feasel

Jeroen Ooms walks us through some audio analysis with R and the av package:

The latest version of the rOpenSci av package includes some useful new tools for working with audio data. We have added functions for reading, cutting, converting, transforming, and plotting audio data in any popular audio / video format (mp3, mkv, aac, etc).
The functionality can either be used by itself, or to prepare audio data for further analysis in R using other packages. We hope this clears an important hurdle to use R for research on speech, music, and whale mating calls.

One of the most interesting things I saw Edward Tufte demonstrate was visualizing music using the Music Animation Machine. There’s a lot of space here to experiment. H/T R-Bloggers.

Comments closed

Generating Fake Data with R

Published 2020-02-03 by Kevin Feasel

Dave Mason takes a look at generating fake PII in R:

I’ve been thinking about R and how it can be used by developers, DBAs, and other SQL Server professionals that aren’t data scientists per se. A recent article about generating a data set of fake transactional data got me thinking about this again and I wondered, can R be used to obfuscate PII data?
In a word, yes. Well, mostly. (More on this in a bit.) As with anything R-related, there are probably multiple packages that are useful for any given task. For this one, I’ll focus on the “generator” package.

Click through to see what it does and Dave’s thoughts on the topic. It would also be possible to generate fake data in R by hitting a web API like Daniel Hutmacher’s service.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: R