Press "Enter" to skip to content

Category: R

Building a Custom Color Palette for ggplot2

Tomaz Kastrun pulls out the color swatches:

A simple, yet effective way to set your colour palette in R using ggplot library.

Click through for the demonstration. Tomaz keeps the text very light in this post, so I’ll do a little vamping of my own. Creating a custom palette is neat, but do make sure that your custom palette works for users with color vision deficiency (CVD). Taking Tomaz’s bar chart into Coblis (an amazing tool I continue to use quite regularly), here’s what it looks like for people with protanopia—that is, no red cones in their eyes:

It’s not awful, particularly because Tomaz changed the fill but not the border color, so you get a funky striation effect.

But the real kicker is if you switch to the monochromatic option in Coblis.

Granted, I know of exactly one person with monochromacy, so if you want to be fair, this isn’t one I’d check for on a webpage. But the large majority of technical books have grayscale images because it saves money on printing, so if this were your sweet-looking color scheme and you’re adding the image into a book, readers would need to focus particularly hard on the bars to figure anything out.

Comments closed

healthR.data Package Updates

Steven Sanderson has an update for us:

I’m excited to share the latest updates to the healthyR.data R package! This release brings new functionality and minor improvements, all aimed at making your data management tasks easier and more efficient. Here’s a breakdown of what’s new:

Read on for information on four new functions and a couple of bugfixes.

Comments closed

Splitting a Number into Component Digits in R

Steven Sanderson does a bit of splitting:

Splitting numbers into individual digits can be a handy trick in data analysis and manipulation. Today, we’ll explore how to achieve this using base R functions, specifically gsub() and strsplit(). Let’s walk through the process step by step, explain the syntax of each function, and provide some examples for clarity.

Click through for a pair of examples.

Comments closed

Removing Elements from a Vector in R

Steven Sanderson wants to leave one of these things out:

Working with vectors is one of the fundamental aspects of R programming. Sometimes, you need to remove specific elements from a vector to clean your data or prepare it for analysis. This post will guide you through several methods to achieve this, using base R, dplyr, and data.table. We’ll look at examples for both numeric and character vectors and explain the code in a straightforward manner. By the end, you’ll have a clear understanding of how to manipulate your vectors efficiently. Let’s dive in!

Read on for three pairs of examples, one for numeric vectors and one for character vectors.

Comments closed

Accuracy is Not Enough for Classification

I have a new video:

In this video, I explain why accuracy is not the be-all, end-all measure for classification. After that, I introduce the confusion matrix, a mechanism for tracking predicted versus actual values. Then, I talk about a variety of measures and how we can derive them from the confusion matrix.

The trickiest part of the confusion matrix measures is just remembering which measures comport to which combinations in the matrix. The second-trickiest part of the confusion matrix is that R and Python invert them, so reading across the top row in R is equivalent to reading down the first column in Python.

Comments closed

R’s Global Regular Expression Function

Steven Sanderson has me wondering who Greg is and why he gets an expression of his own:

If you’ve ever worked with text data in R, you know how important it is to have powerful tools for pattern matching. One such tool is the gregexpr() function. This function is incredibly useful when you need to find all occurrences of a pattern within a string. Today, we’ll go into how gregexpr() works, explore its syntax, and go through several examples to make things clear.

Read on to learn more about the global regular expression function and how it works.

Comments closed

Counting Words in a String in R

Steven Sanderson counts the ways:

Counting words in a string is a common task in data manipulation and text analysis. Whether you’re parsing tweets, analyzing survey responses, or processing any textual data, knowing how to count words is crucial. In this post, we’ll explore three ways to achieve this in R: using base R’s strsplit(), the stringr package, and the stringi package. We’ll provide clear examples and explanations to help you get started.

I, of course, would commission a 128-node Hadoop cluster and write a few dozen pages of Java code to get the answer.

Comments closed

Making Code Developer Friendly with an Example in R

Mark Niemann-Ross says the rest is commentary:

If you are reading this, you’re a coder and use functions. We write them for ourselves. If someone else writes a function, you can hope it works. If it doesn’t, you can hope to fix it. Hopefully, the return value is obviously correct. But maybe it’s subtly wrong?

If things are amiss, read the name of the function and hope it’s descriptive. I worked with a programmer who omitted all vowels from his function names. So the above code would expand to this…

Read on for the rationale behind commenting your functions appropriately, as well as one way to do it in R. There is a bit of art and a bit of science to writing good comments, but the starting point is simply having them to begin with. And the more clever you feel like you’re being, the more you need to comment this, because three months from now, you probably won’t be feeling quite as clever. H/T R-Bloggers.

Comments closed