Press "Enter" to skip to content

Category: R

Multidimensional Scaling in R

Steven Sanderson is from the 5th dimension:

Visualizing similarities between data points can be tricky, especially when dealing with many features. This is where multidimensional scaling (MDS) comes in handy. It allows us to explore these relationships in a lower-dimensional space, typically 2D or 3D for easier interpretation. In R, the cmdscale() function from base R and is a great tool for performing classical MDS.

Click through to see how this works. In case you’re curious, cmdscale() is an example of principal coordinates analysis. If you’re familiar with principal components analysis, that’s a different form of multidimensional scaling.

Leave a Comment

Tips for Dealing with Large Spatial Datasets

Rhian Davies consults the map:

I love playing with spatial data. Perhaps because I enjoy exploring the outdoors, or because I spend hours playing Geoguessr, or maybe it’s just because maps are pretty but there’s nothing more fun than tinkering with location data.

However, reading in spatial data, especially large data sets can sometimes be a pain. Here are some simple things to consider when working in spatial data in R and breaking large data sets into more manageable chunks.

Click through for three tips when dealing with spatial data. The code is in R but the tips make sense in any language.

Leave a Comment

Normalizing Data in R

Steven Sanderson says, act normal:

Data normalization is a crucial preprocessing step in data analysis and machine learning workflows. It helps in standardizing the scale of numeric features, ensuring fair treatment to all variables regardless of their magnitude. In this tutorial, we’ll explore how to normalize data in R using practical examples and step-by-step explanations.

Read on for a definition of what this means and how you can do it.

Leave a Comment

Quantile Normalization in R

Steven Sanderson has achieved normality:

Before we dive into the code, let’s understand the concept behind quantile normalization. At its core, quantile normalization aims to equalize the distributions of multiple datasets by aligning their quantiles. This ensures that each dataset has the same distribution of values, making meaningful comparisons possible.

This is a bit different from normalizing individual data points in one dataset, as you can see in the post.

Comments closed

Using the map() Function in purrr

Steven Sanderson reads the map():

In the world of data manipulation and analysis with R, efficiency and simplicity are paramount. One function that epitomizes these qualities is map(). Whether you’re a novice or a seasoned R programmer, mastering map() can significantly streamline your workflow and enhance your code readability. In this guide, we’ll delve into the syntax, usage, and numerous examples to help you harness the full power of map().

Click through for examples of how this works in R.

Comments closed

Using dtplyr to Process Large Datasets

Dario Radecic takes us through an interesting library:

In a world where compute time is billed by the second, make every one of them count. There are zero valid reasons to utilize a quarter of your CPU and memory, but achieving complete resource utilization isn’t always a straightforward task. That is if you don’t know about R dtplyr.

One option is to use dplyr. It’s simple to use and has intuitive syntax. But it’s slow. The other option is to use data.table. It’s lightning-fast but has a steep learning curve and syntax that’s not too friendly to follow. The third – and your best option – is to combine the simplicity of dplyr with efficiency of data.table. And that’s where R dtplyr chimes in!

Today you’ll learn just how easy it is to switch from dplyr to dtplyr, and you’ll see hands-on the performance differences between the two. Let’s dig in!

I love the performance of the data.table library but strongly prefer the Tidyverse for the sake of convenience. I like that this bridges the gap, at least for dplyr style processing. H/T R-Bloggers.

Comments closed

Using the tapply() Function in R

Steven Sanderson applies things a different way:

Hey R enthusiasts! Today we’re diving into the world of data manipulation with a fantastic function called tapply(). This little gem lets you apply a function of your choice to different subgroups within your data.

Imagine you have a dataset on trees, with a column for tree height and another for species. You might want to know the average height for each species. tapply() comes to the rescue!

Read on to see how it works.

Comments closed

ggbrick in CRAN

Dan Oehm notes another brick in the wall:

If you’re looking for something a little different, ggbrick creates a ‘waffle’ style chart with the aesthetic of a brick wall. The usage is similar to geom_col where you supply counts as the height of the bar and a fill for a stacked bar. Each whole brick represents 1 unit. Two half bricks equal one whole brick.

It has been available on Git for a while, but recently I’ve made some changes and it now has CRAN’s tick of approval.

Click through to see how you can use it. This style of waffle chart, in the right scenario, can be quite useful, providing a high-level view and also giving you some idea of fine-grained magnitudes. H/T R-Bloggers.

Comments closed