Press "Enter" to skip to content

Category: R

Making an R Box Plot from a Picture

Tomaz Kastrun builds a plot:

We create a raster image from a picture and calculating the ratio of the pixels on the scale of grayscale. The more the darker colour is represented in the pixels, the bigger the value. And this value is converted into the vector of values. And each vector is represneted as a violin boxplot.

Click through for an example.

Comments closed

Building Packages from Base R Files

John Nash and Arkajyoti Bhattacharjee package things up:

This article tries to explain an approach to developing alternative versions of functions which are in the distributed base of R. Our interest was in developing improvements to the nls() function and related features in R as part of a Google Summer of Code project for which Arkajyoti Bhattacharjee was the funded student. However, nls() has many tentacles involving a number of files and functions that may or may not be called as nls() is executed.

Part of the difficulty in carrying out such development of alternative versions is that one needs to be able to execute the new variants in parallel with the existing ones. A heavy-effort approach would be to have separate full sets of R code and build each system and run them separately. That is, we want to have two or more versions of R in the same computing system.

Read on for the process, some difficulties you might encounter along the way, and specific issues you might run into on Windows. H/T R-Bloggers.

Comments closed

Great Features in R 4.0.0

Colin Gillespie looks at a few new features in R version 4:

version 4.0.0 was released almost two years ago. The change in the major version, 3.x.y to 4.0.0, represented significant and potentially breaking changes. For an organisation to start using these new features, everyone in the company must have access to that version; otherwise code isn’t shareable. This naturally slows down adoption.

We moved our internal R projects to depend on version R 4.0.0 around twelve months ago – a few months after the release date. Over the last year we’ve also assisted a number of clients in making the move; typically with Shiny applications. This post aims to highlight some of the features we’ve found useful and also some of the potential pitfalls.

It’s crazy to think that it’s been so long

Comments closed

Identifying R Package and Function Use in GitHub Repos

Bryan Shalloway does a search:

TLDR: funspotr provides helpers for spotting the functions and packages in R and Rmarkdown files and associated github repositories. See Examples for catalogues of the functions/packages used in posts by Julia Silge, David Robinson, and others.

This is an interesting project. I’d imagine that with enough different code bases, you could develop a programming profile and possibly understand people’s strengths on a variety of characteristics like which functions they use, what they use given alternatives (e.g., “functional-friendly” map versus the *apply series versus loops), and how familiar they are with certain packages. I could see this being an advanced technique for learning what you should learn next: you obviously have familiarity with packages A, B, and C but it appears you don’t know about E or K and you might learn them to replace some of the work you’re doing with C.

Comments closed

Handling Categorical Data in R

The RSquared Academy blog has a two-parter on handling categorical data in R. Part 1 elaborates on kinds of categorical data and introduces a case study:

While we can rank the categories, we cannot assign a value to them. For example, in satisfaction ranking, we cannot say that like is twice as positive as dislike i.e. we are unable to say how much they differ from each other. While the order or rank of data is meaningful, the difference between two pieces of data cannot be measured/determined or are meaningless. Ordinal data provide information about relative comparisons, but not the magnitude of the differences.

Part 2 shows off ways to work with categorical data in tables:

In this section, we will explore the above ways of summarizing categorical data. We will also spend some time learning about tables as you will be using them extensively while working with categorical data. R has many packages for tabulating data and we list and explore all of them in the R scripts shared in the GitHub repository.

Click through for both guides. H/T R-Bloggers.

Comments closed

Mastermind in R

Tomaz Kastrun continues a series of useful games:

Playing a simple guessing game with R. It’s called Mastermind game! This game was originally created for two people, but R version will be for single-player mode, when an R developer or R data scientists need a break.

The gameplay is simple and so are the rules. The board contains 10 rows (or more) with possibilities of four colours and code pegs (white or black). R engine stores a secret colour combination and user selects a random combination.

Click through to see it in action.

Comments closed

Multiple Code Panes in R Studio

Tomaz Kastrun has good news for us:

On R studio home page, make sure to download the version 2021.09 Preview (as of writing of this blogpost, this is still in preview) and install this version on your client machine (supported windows machine, MacOS and Linux).

Once installation is completed, head to global options (Tools->Global options) and select Pane Layout. You will have a new set of buttons available (Add Column; Remove Column). With Add column an additional pane will be added to layout.

It’s not as convenient as the right-click –> “Split horizontally” or “Split vertically” that we get in tools like SSMS and VS Code, but I’m happy to see this change in R Studio.

Comments closed

Creating Fireworks with R

Tomaz Kastrun is ready for Silvester:

New Year’s eve is almost here and what best way to celebrate with fireworks. Snap, pop, crack, boom. This is the most peaceful, animal friendly, harmless, eco, children friendly, no-fire-needed, educative and nifty fireworks.

To get the fireworks, fire up the following R function.

I mean, but I enjoy fire… Though you could launch these in R and save the good stuff for the 4th of July.

Comments closed

Wrapping up a Spark Advent Calendar

Tomaz Kastrun did it: 25 posts in 25 days on Spark. Part 23 looks at Delta Live Tables:

Delta Live Tables is a framework for building reliable, maintainable, and testable data processing pipelines. User defines the transformations to be performed on the datasources and data, and the framework manages all the data engineering tasks: task orchestrations, cluster management, monitoring, data quality, and event error handling.

Delta Live Tables framework helps and manages how data is being transformed with help of target schema and can is a slight different experience with Databricks Tasks (with Apache Spark tasks in the background).

Part 24 takes us through a bit of visualization:

You can use any of the popular Python packages to do the visualisation; Plotly, Dash, Seaborn, Matplotlib, Bokeh, Leather, Glam, to name the couple and many others. Once the data is persisted in dataframe, you can use any of the packages. With the use of PySpark, plugin the Matplotlib. Here is an example

And part 25 wraps things up with links to additional resources:

To wrap up this year’s Advent of Spark 2021 – series of blogposts on Spark – it is essential to look at the list of additional learning resources for you to continue with this journey. Let’s divide this list not by type of the resource (book, on-line documentation, on-line courses, articles, Youtube channels, Discord channels, and others) but rather divide them by language flavour. Scala/Spark, R, and Python.

Great job on Tomaz’s part for gutting it out.

Comments closed