Press "Enter" to skip to content

Category: R

Using dtplyr to Process Large Datasets

Dario Radecic takes us through an interesting library:

In a world where compute time is billed by the second, make every one of them count. There are zero valid reasons to utilize a quarter of your CPU and memory, but achieving complete resource utilization isn’t always a straightforward task. That is if you don’t know about R dtplyr.

One option is to use dplyr. It’s simple to use and has intuitive syntax. But it’s slow. The other option is to use data.table. It’s lightning-fast but has a steep learning curve and syntax that’s not too friendly to follow. The third – and your best option – is to combine the simplicity of dplyr with efficiency of data.table. And that’s where R dtplyr chimes in!

Today you’ll learn just how easy it is to switch from dplyr to dtplyr, and you’ll see hands-on the performance differences between the two. Let’s dig in!

I love the performance of the data.table library but strongly prefer the Tidyverse for the sake of convenience. I like that this bridges the gap, at least for dplyr style processing. H/T R-Bloggers.

Comments closed

Using the tapply() Function in R

Steven Sanderson applies things a different way:

Hey R enthusiasts! Today we’re diving into the world of data manipulation with a fantastic function called tapply(). This little gem lets you apply a function of your choice to different subgroups within your data.

Imagine you have a dataset on trees, with a column for tree height and another for species. You might want to know the average height for each species. tapply() comes to the rescue!

Read on to see how it works.

Comments closed

Mock Tests for R Packages

Maelle Salmon does a bit of mocking:

This blog featured a post on mocking, the art of replacing a function with whatever fake we need for testing, years ago. Since then, we’ve entered a new decade, the second edition of Hadley Wickham’s and Jenny Bryan’s R packages book was published, and mocking returned to testthat, so it’s time for a new take/resources roundup!

Click through to see how you can create mocks in R as well as some practical examples of mocks in action.

Comments closed

ggbrick in CRAN

Dan Oehm notes another brick in the wall:

If you’re looking for something a little different, ggbrick creates a ‘waffle’ style chart with the aesthetic of a brick wall. The usage is similar to geom_col where you supply counts as the height of the bar and a fill for a stacked bar. Each whole brick represents 1 unit. Two half bricks equal one whole brick.

It has been available on Git for a while, but recently I’ve made some changes and it now has CRAN’s tick of approval.

Click through to see how you can use it. This style of waffle chart, in the right scenario, can be quite useful, providing a high-level view and also giving you some idea of fine-grained magnitudes. H/T R-Bloggers.

Comments closed

Using the cut() Function in R

Steven Sanderson is about to cut somebody:

In the realm of data analysis, understanding how to effectively segment your data is paramount. Whether you’re dealing with age groups, income brackets, or any other continuous variable, the ability to categorize your data can provide invaluable insights. In R, the cut() function is a powerful tool for precisely this purpose. In this guide, we’ll explore how to harness the full potential of cut() to slice and dice your data with ease.

Read on for examples of how to use the cut() function.

Comments closed

Duplicating Rows in R

Steven Sanderson repeats the punch line a few times:

Are you working with a dataset where you need to duplicate certain rows multiple times? Perhaps you want to create synthetic data by replicating existing observations, or you need to handle imbalanced data by oversampling minority classes. Whatever the reason, replicating rows in a data frame is a handy skill to have in your R programming toolkit.

In this post, we’ll explore how to replicate rows in a data frame using base R functions. We’ll cover replicating each row the same number of times, as well as replicating rows a different number of times based on a specified pattern.

Click through to replicate data without copy-paste.

Comments closed

Plotting Training and Testing Results with tidyAML

Steven Sanderson builds a plot:

In the realm of machine learning, visualizing model predictions is essential for understanding the performance and behavior of our algorithms. When it comes to regression tasks, plotting predictions alongside actual values provides valuable insights into how well our model is capturing the underlying patterns in the data. With the plot_regression_predictions() function in tidyAML, this process becomes seamless and informative.

Read on to see how the function works and the kind of result you can expect from it.

Comments closed

tidyAML 0.0.5 Now Available

Steven Sanderson has an announcement:

I’m thrilled to announce the latest release of tidyAML, version 0.0.5, now available for download on CRAN or GitHub!

In this release, we’ve introduced some fantastic new features and made minor fixes and improvements to enhance your experience with tidyAML.

Click through to see what’s new in this version.

Comments closed

Pulling Samples in R with sample()

Steven Sanderson takes a sample:

The sample() function in R is a powerful tool that allows you to generate random samples from a given dataset or vector. It’s an essential function for tasks such as data analysis, Monte Carlo simulations, and randomized experiments. In this blog post, we’ll explore the sample() function in detail and provide examples to help you understand how to use it effectively.

Read on to see what options are available with sample() and the different ways in which you can use the function.

Comments closed