Press "Enter" to skip to content

Category: R

Diagnosing a textConnection() Slowdown in R

Yihui Xie looks into an issue:

Running quarto render on a document with that single chunk took 35 seconds. The equivalent rmarkdown::render() finished in under half a second. As a side note in the issue, the reporter pinged me that the same problem existed in litedownlitedown is independent of both Quarto and knitr; it executes R code through xfun::record(). That is where I started looking.

Click through for the discovery process, explanation, and fix.

Leave a Comment

Compressing Images in R

Yihui Xie announces a new package:

Last month, @bastistician opened an issue on the litedown repo pointing out that knitr has a hook_pngquant() function for compressing PNG plots from code chunks, but litedown lacks such a feature. He included a reasonable workaround—calling system2("pngquant", ...) with litedown::get_context("plot_files") in a chunk at the end of the vignette. It shrank his vignette from 80 KB to 54 KB, which is a 33% reduction. Not bad.

The catch, of course, is that it requires pngquant to be installed on the system. For R users, installing a system binary is more friction than it sounds: it is brew install pngquant on macOS, a separate package manager invocation on Linux, and hunting down a standalone executable on Windows. If you maintain a package that others will build, you are now asking all of them to do this—for every machine they use. By contrast, install.packages("tinyimg") works the same way everywhere, which is the kind of simplicity that makes a tool actually get used.

This is why I created tinyimg.

Read on for more details about how tinyimg works, how well it compresses, and how it integrates with litedown.

Leave a Comment

Sharing Data between Shiny Modules

Colin Fay explains an architecture:

Some people have recently been vocal about misuses of the "stratégie du petit r", a mechanism for sharing data across {shiny} modules that was detailed both in the Engineering Production-Grade Shiny Apps book and in an older post written in 2019 on this blog.
And yes, if you’re wondering, I did feel old when I realized this blog post is almost 7 years old now

I’m always happy to be proven wrong, to challenge the way I build software, and to become a better software engineer. But given that we weren’t contacted to discuss the ideas behind this strategy, I thought the moment was perfect to give y’all an update on the latest approaches I’ve been using to share data across {shiny} modules, along with some thoughts and comments on the "stratégie du petit r".

Click through to learn more about Shiny modules, the challenge of passing information between modules, and some high-level ideas of how to pass information between these modules without everything falling apart. H/T R-Bloggers.

Leave a Comment

Loading Multiple CSV Files in R

Stephen Royle has a set of challenges:

In a previous post, I described how to run a session to teach R to cell biologists. In this post we’ll look in a bit more detail at one of the steps: how to load data into R.

As a reminder, a typical analysis task in cell biology follows these steps:

  • do the experiment(s)
  • collect the data – e.g. microscopy images
  • analyse and get a plain text (csv) output – e.g. using Fiji
  • load the data into R
  • crunch some numbers and plot

Click through for some explanation, as well as a few exercises (and solutions) for people trying to learn the language. H/T R-Bloggers.

Comments closed

When Order Matters

The Rtask people tell a story:

You have inherited (or written) a data pipeline originally coded in SAS. It processes administrative billing records: matching line items against reference tables, applying time-varying coefficients, deduplicating based on business identifiers, computing running counters. Classic ETL work.

The migration to R goes well. You use {DBI} to open a DuckDB connection, load your source files as lazy tables via {arrow} or dplyr::tbl(), build the transformations with {dbplyr}, and collect the result at the very end. Your code is readable, your tests compare the R output to the SAS reference, and they pass (maybe using {datadiff}).

Then you run the pipeline again.

The numbers are different.

Give yourself 100 points if you answered “Because you need an ORDER BY clause” during the explanation. They also cover a few other places where DuckDB interactions in R can cause issues. Most of this is straightforward for data platform people, but can cause consternation for developers. H/T R-Bloggers.

Comments closed

Dockerizing Shiny and RMarkdown Apps

Thomas Williams builds a container image:

Running R Markdown via Docker is a big help with deploying those analysis and reports for others to use.

While there are alternatives for deploying R Markdown — the most popular being Shiny Server — they add the hassle of matching R and package versions to the machine where the code was developed, may require IT help, and mean sharing disk, CPU, and memory with other apps on the same server. And there’s also licensing requirements & limitations for the free version of Shiny Server.

R Markdown on Docker avoids some of these issues; a single Docker container = a single app.

Click through for a simple RMarkdown file and Dockerfile.

Comments closed

Tracking Typing Speed with R

Tomaz Kastrun is pushing aside Mavis Beacon:

Did you ever wonder how fast and with accuracy your typing is?

For this instance, we will introduce some random pangrams, code samples and random strings sotrted by level of difficulty.

This was kind of fun. I could hit about 80 or so WPM on the easy code examples and about 120 on the pangrams (with consistency between difficulties). Also, “Sphinx of black quartz judge my vow” is a pretty awesome thing to shout at the most opportune time.

Comments closed

Viewing Nested Lists in R with xfun::tabset()

Yihui Xie wants to document the contents of nested lists:

Complex nested lists in R can be difficult to explore and understand at a glance. The str() function is helpful for examining structure, but large nested lists can quickly become overwhelming.

While I was writing the documentation for tabsets in litedown, I almost laughed at myself for the support for nested tabsets, because I had no idea why anyone would want this feature. However, I suddenly realized that it can be a very useful tool for exploring nested lists in an interactive way, so I wrote a quick implementation: xfun::tabset().

Click through to see how it works.

Comments closed

NYC Open Data R Package

Antoine Soetewey announces a package:

I am pleased to announce the release of nycOpenData, an R package providing convenient, tidy access to dozens of datasets from the New York City Open Data platform.

The package is designed as part of an open-science and reproducible-research effort, with the goal of lowering the friction between public data and statistical analysis—especially for teaching, exploratory research, and applied civic work.

It is available on CRAN, so it should be easy to grab. H/T R-Bloggers.

Comments closed