Press "Enter" to skip to content

Category: R

Orchestrating Data Pipelines in R with maestro

Will Hipson moves some data:

If you look at data orchestration tools today you are bombarded with a dizzying array of software platforms that claim unsurpassed processing capability, AI-readiness, elegant UIs, etc. Apache Airflow is just one example of a popular orchestration platform that scales to meet virtually any orchestration need. And while these claims may be true, I argue it is rarely the case that these gargantuan platforms are needed in the first place. For most data engineers, you probably only need to process a moderate amount of data at a moderate time scale. Moreover, if you’re an R user, you don’t want to have to define your data pipelines using drag-and-drop tools or learn another programming language. Not only will this reduce cloud costs but also development time costs.

Click through to see why Will developed maestro and how it works. H/T R-Bloggers.

Comments closed

Building Sparklines in R Shiny

Osheen MacOscar needs a series of tiny visuals:

All of the plots from this package use the sparkline function, and we pass the type of chart we want as the type argument (default is line). The function will take a vector or list for the values argument, depending on the type of chart we are creating this can be either data to plot or specifications for the plot.

Click through for a list of the sparkline-esque visuals you can create as well as how you can use the sparkline package and some examples of outputs.

Comments closed

Defining “Laziness” in R

Maëlle Salmon, Athanasia Mo Mowinckel, and Hannah Frick are quite studious:

In the programming world, laziness can often be a good thing: it is both a human quality that can motivate automation efforts, and a programming concept that avoids wasting resources such as memory. Now, when reading code or documentation, seeing the word “lazy” can be confusing, because of its polisemy: it carries several meanings. In this post, we will enumerate the different possible definitions of “lazy” in R code.

Read on for a variety of contexts around “lazy,” including lazy evaluation, lazy database queries, lazy loading, and more.

Comments closed

Inflation in Medieval China

Richard Vale digs into a dataset:

In this post, I would like to draw attention to a very interesting data set collected by Guan, Palma and Wu as part of the replication package for their paper The rise and fall of paper money in Yuan China, 1260-1368. The paper describes inflation, money and prices during the Yuan Dynasty era in China.

First, a little historical background.

Read on for the analysis. H/T R-Bloggers.

Comments closed

Building Flowcharts in R

Pau Satorra makes a chart:

Fortunately, there are several packages in R for drawing flowcharts using different approaches. The problem is that the programming is generally quite complex, and the numbers have to be entered manually or parameterized beforehand. These flowcharts can have reproducible problems because if data changes, we have to manually change the parameters again.

To make our lives easier, there’s a new {flowchart} package that uses the tidyverse workflow, which allows to create many different types of flowcharts in just a few steps.

Read on to learn more about the package. I originally thought it was based on mermaid.js based on the way the final product looked, but a quick code review has disabused me of the notion. H/T R-Bloggers.

Comments closed

Porting an R Shiny App to Observable Framework

Tim Brock makes a change:

If you’re interested in interactive data visualisation you’ve probably heard of the d3 JavaScript library, even if you’ve never used it or even know any JavaScript. Mike Bostock, the creator of d3, and colleagues followed this up with d3.express, which was quickly renamed to Observable.

Read on to see how you can build a simple Observable Framework app without spending a lot of time troubleshooting JavaScript code.

Comments closed

Prevalence Adjustment in Binary Classifiers

David Lindelöf deal with an issue in analyzing classification models:

When you run a binary classifier over a population you get an estimate of the proportion of true positives in that population. This is known as the prevalence.

But that estimate is biased, because no classifier is perfect. 

Read on to learn what this means for precision, as well as one technique for tracking prevalence changes over itme.

Comments closed

Building a QR Code Clock

Tomaz Kastrun checks what time it is:

Ever wanted to have a clock on the wall or in the office, that is not binary. But it is QR-Code clock. Well, now you can have it.

This useless R function generates new QR Code for every given period and tells the time.

Click through for the code. I could see this being useful in scenarios where you want to avoid people copying the QR code, so you embed the time in there. Then, your reader service can check to see if the time is within some valid boundary, returning an error if not.

Comments closed