Press "Enter" to skip to content

Category: R

Percentage by Group in R

Steven Sanderson performs a breakdown:

Calculating percentages by group is a common task in data analysis. It allows you to understand the distribution of data within different categories. In this blog post, we’ll walk you through the process of calculating percentages by group using three popular R packages: Base R, dplyr, and data.table. To keep things simple, we will use the well-known Iris dataset.

The Iris dataset contains information about different species of iris flowers and their measurements, including sepal length, sepal width, petal length, and petal width. We will focus on the ‘Species’ column and calculate the percentage of each species in the dataset.

Read on for the three approaches. I think the Tidyverse approach is the easiest to understand in this case, though all three get you to the answer.

Comments closed

Subsetting List Objects in R

Steven Sanderson makes a sub-list and checks it twice:

If you’re an aspiring data scientist or R programmer, you must be familiar with the powerful data structure called “lists.” Lists in R are collections of elements that can contain various data types such as vectors, matrices, data frames, or even other lists. They offer great flexibility and are widely used in many real-world scenarios.

In this blog post, we will explore one of the essential skills in working with lists: subsetting. Subsetting allows you to extract specific elements or portions of a list, helping you access and manipulate data efficiently. So, let’s dive into the world of list subsetting and learn some useful techniques along the way!

Read on for multiple ways of subsetting lists in base R.

Comments closed

Finding Duplicate Rows and Values in R

Steven Sanderson de-duplicates, starting with values:

In data analysis and programming, it’s common to encounter situations where you need to identify duplicate values within a dataset. Whether you’re a beginner or an experienced programmer, knowing how to find duplicate values is a fundamental skill. In this blog post, we will explore two different approaches to accomplish this task using base R functions and the dplyr package in R. By the end, you’ll have a clear understanding of how to detect and manage duplicate values in your own datasets.

From there, we get to see various ways to de-duplicate rows in R:

In data analysis and manipulation tasks, it’s common to encounter situations where we need to identify and handle duplicate rows in a dataset. In this blog post, we will explore three different approaches to finding duplicate rows in R: the base R method, the dplyr package, and the data.table package. We’ll compare their performance using the benchmark function and provide insights on when to use each approach. So, grab your coding gear, and let’s dive in!

Duplicate values is a relatively tricky one, with rows being much easier.

Comments closed

Modularizing an Existing Shiny App

Peter Baranovskiy breaks it down:

There are multiple tutorials available online on writing modular Shiny apps. So why one more? Well, when I just started with building modular apps myself, these didn’t do much for me. So I really only learned how to write modules when I had an opportunity to team up with an experienced R Shiny developer. The reason I guess is that Shiny modules is an advanced topic, and you typically get to writing modules only when you finally need to scale your apps – and keep opportunities for further scaling open. This typically means when your app goes into production. By then you probably have already developed multiple apps, and switching over to a way of thinking required to write modules may be challenging. If you don’t know what modules are, I recommend starting here and then coming back to this post. Otherwise, read on.

So, I decided to try a different approach and instead of building a simple modular app from scratch, to go in the opposite direction by breaking down a complex real-life app into modules. Here’s the app’s original non-modular code. Note a single app.R file that contains the entire app. static_assets.R includes some object definitions which I moved to a separate file for convenience. calgary_crime_data_prep.R is not part of the app; it is a data retrieval and cleaning script executed once a month with cron. Running the script each time the app launches would have made it extremely slow and would use way too much bandwidth, as the script downloads and processes 150+ Mb of data on each run.

Read on for the reasoning behind using modules, as well as Peter’s notes on the process.

Comments closed

Object Comparison in R

Steven Sanderson checks two objects:

In the realm of programming, R is a widely-used language for statistical computing and data analysis. Within R, there exists a powerful function called identical() that allows programmers to compare objects for exact equality. In this blog post, we will delve into the syntax and usage of the identical() function, providing clear explanations and practical examples along the way.

You can also take a look at the documentation for this function to see a few more examples.

Comments closed

Creating an HTTP Header Hash in R

Bob Rudis creates an R package:

HTTP Headers Hashing (HHHash) is a technique developed by Alexandre Dulaunoy to generate a fingerprint of an HTTP server based on the headers it returns. It employs one-way hashing to generate a hash value from the list of header keys returned by the server. The HHHash value is calculated by concatenating the list of headers returned, ordered by sequence, with each header value separated by a colon. The SHA256 of this concatenated list is then taken to generate the HHHash value. HHHash incorporates a version identifier to enable updates to new hashing functions.

Read on to see when it might be useful and other things you should know about the package. H/T R-Bloggers.

Comments closed

Customizing a Shiny App Theme

Peter Baranovskiy doesn’t want bog standard but is okay with mostly standard:

There are multiple ways to style or theme a Shiny app. A high-level overview is available in the Mastering Shiny book by Hadley Wickham. Here I’ll show the easiest way to do this. If you need to build an entirely – or mostly – new Shiny theme (e.g. a corporate theme), this post is probably not for you. In that case bslib may be the best starting point. Otherwise, if you are generally happy with a pre-made theme and just want to tweak some of its elements, read on.

This post is based on an actual app, so that you can see how it all works. Here’s the app’s source.

Click through to see what kinds of changes you can make without a major overhaul.

Comments closed

File Renaming in R

Steven Sanderson renames a file:

Managing files is an essential task for any programmer, and when working with R, the file.rename() function can become your best friend. In this blog post, we’ll explore the ins and outs of file.rename(), discuss its syntax, provide real-life examples, and share some best practices to empower you in your file management endeavors. So grab a cup of coffee and let’s dive into the world of file.rename()!

Click through for two examples. But do read the documentation if you’re running on Windows and dealing with symbolic links.

Comments closed

Rolling Correlation in R

Steven Sanderson tries out a function:

In the world of data analysis, time-series data is a common sight. Whether it’s stock prices, weather patterns, or website traffic, understanding the relationship between variables over time is crucial. One valuable technique in this domain is calculating rolling correlation, which allows us to examine the evolving correlation between two variables as our data moves through time. In this blog post, we will delve into the rollapply function and its capabilities, exploring its applications through a series of practical examples. So, let’s get started!

Click through for an example of how it works.

Comments closed