Press "Enter" to skip to content

Category: R

Loading Multiple CSV Files in R

Stephen Royle has a set of challenges:

In a previous post, I described how to run a session to teach R to cell biologists. In this post we’ll look in a bit more detail at one of the steps: how to load data into R.

As a reminder, a typical analysis task in cell biology follows these steps:

  • do the experiment(s)
  • collect the data – e.g. microscopy images
  • analyse and get a plain text (csv) output – e.g. using Fiji
  • load the data into R
  • crunch some numbers and plot

Click through for some explanation, as well as a few exercises (and solutions) for people trying to learn the language. H/T R-Bloggers.

Leave a Comment

When Order Matters

The Rtask people tell a story:

You have inherited (or written) a data pipeline originally coded in SAS. It processes administrative billing records: matching line items against reference tables, applying time-varying coefficients, deduplicating based on business identifiers, computing running counters. Classic ETL work.

The migration to R goes well. You use {DBI} to open a DuckDB connection, load your source files as lazy tables via {arrow} or dplyr::tbl(), build the transformations with {dbplyr}, and collect the result at the very end. Your code is readable, your tests compare the R output to the SAS reference, and they pass (maybe using {datadiff}).

Then you run the pipeline again.

The numbers are different.

Give yourself 100 points if you answered “Because you need an ORDER BY clause” during the explanation. They also cover a few other places where DuckDB interactions in R can cause issues. Most of this is straightforward for data platform people, but can cause consternation for developers. H/T R-Bloggers.

Leave a Comment

Dockerizing Shiny and RMarkdown Apps

Thomas Williams builds a container image:

Running R Markdown via Docker is a big help with deploying those analysis and reports for others to use.

While there are alternatives for deploying R Markdown — the most popular being Shiny Server — they add the hassle of matching R and package versions to the machine where the code was developed, may require IT help, and mean sharing disk, CPU, and memory with other apps on the same server. And there’s also licensing requirements & limitations for the free version of Shiny Server.

R Markdown on Docker avoids some of these issues; a single Docker container = a single app.

Click through for a simple RMarkdown file and Dockerfile.

Leave a Comment

Tracking Typing Speed with R

Tomaz Kastrun is pushing aside Mavis Beacon:

Did you ever wonder how fast and with accuracy your typing is?

For this instance, we will introduce some random pangrams, code samples and random strings sotrted by level of difficulty.

This was kind of fun. I could hit about 80 or so WPM on the easy code examples and about 120 on the pangrams (with consistency between difficulties). Also, “Sphinx of black quartz judge my vow” is a pretty awesome thing to shout at the most opportune time.

Comments closed

Viewing Nested Lists in R with xfun::tabset()

Yihui Xie wants to document the contents of nested lists:

Complex nested lists in R can be difficult to explore and understand at a glance. The str() function is helpful for examining structure, but large nested lists can quickly become overwhelming.

While I was writing the documentation for tabsets in litedown, I almost laughed at myself for the support for nested tabsets, because I had no idea why anyone would want this feature. However, I suddenly realized that it can be a very useful tool for exploring nested lists in an interactive way, so I wrote a quick implementation: xfun::tabset().

Click through to see how it works.

Comments closed

NYC Open Data R Package

Antoine Soetewey announces a package:

I am pleased to announce the release of nycOpenData, an R package providing convenient, tidy access to dozens of datasets from the New York City Open Data platform.

The package is designed as part of an open-science and reproducible-research effort, with the goal of lowering the friction between public data and statistical analysis—especially for teaching, exploratory research, and applied civic work.

It is available on CRAN, so it should be easy to grab. H/T R-Bloggers.

Comments closed

Operating on Distributions in R with distionary

Vincenzo Cola announces a new R package:

After passing through rOpenSci peer review, the distionary package is now newly available on CRAN. It allows you to make probability distributions quickly – either from a few inputs or from its built-in library – and then probe them in detail.

These distributions form the building blocks that piece together advanced statistical models with the wider probaverse ecosystem, which is built to release modelers from low-level coding so production pipelines stay human-friendly. Right now, the other probaverse packages are distplyr, allowing you to morph distributions into new forms, and famish, allowing you to tune distributions to data. Developed with risk analysis use cases like climate and insurance in mind, the same tools translate smoothly to simulations, teaching, and other applied settings.

Click through for an overview of the package.

Comments closed

Randomly Moving the Mouse Cursor in R

Tomaz Kastrun has been so busy, his screensaver never comes on, even when he’s out at lunch:\

New R Package called LazyMouse with single function for randomly moving mouse cursor in your favorite R IDE.

For every R developer, R data scientists and all those everyday R users, that also need a break and do not want the computer to go into sleep mode.

Read on to see how it works. And jokes aside, there have been times in which I’ve wanted something like this to keep the screen from locking up or drives going to sleep when running heavy work overnight on a device I can physically control (i.e., not a workstation I’m leaving on at the office).

Comments closed

Draw Economist-Style Graphs in R

Ozancan Ozdemir replicates a style:

I think everyone agrees on the fact that the Economist magazine produces very-well designed graphics, sometimes the best in the world. The success behind their graph lies on the ability of explaining complex matters in a simpler way by employing traditional data visualization techniques such as line graph or bar plot. They put emphasis on the message they want to convey rather than the aesthetics of the graph itself. They also have a clear hiearchy in their plots and use colors, fonts and lines which represents the brand identity of the magazine.

In this tutorial, we are going to create an Economist-style graph in R by using ggplot2ggthemesshowtextggtextand grid packages. I am going to use a dataset that I have been collecting since 2014 about the poverty line and minimum wage in Turkey, but you can adopt these codes to any dataset you want to visualize.

Click through to learn how.

Comments closed