Press "Enter" to skip to content

Category: R

Counting Groups in R

Steven Sanderson counts items in a group:

As data-driven decision-making becomes more critical in various fields, the ability to extract valuable insights from datasets has never been more important. One common task is to calculate counts by group, which can shed light on trends and patterns within your data. In this guide, we’ll explore three different approaches to achieve this using the powerful R programming language. So, let’s dive into the world of grouped counting with the help of the classic mtcars dataset!

Read on for the base R solution, the dplyr solution (which looks a lot like how we’d solve it in SQL), and the data.table solution.

Comments closed

Managing Plot Parameters in R

Steven Sanderson switches up a visual:

When it comes to data visualization in R, the par() function is an indispensable tool that often goes overlooked. This function allows you to control various graphical parameters, unleashing a world of customization possibilities for your plots. In this blog post, we’ll demystify the par() function, break down its syntax, and provide you with hands-on examples to help you create stunning visualizations.

Click through to check it out. My loyalties definitely lie with ggplot2 for static visual development in R but it’s definitely not the only way to get images to look the way you want them.

Comments closed

Adding Text to a Plot in R

Steven Sanderson texts up a plot:

As a programmer, you’re well aware of the importance of data visualization. A well-crafted plot can convey complex information with clarity and impact. In R, creating stunning plots is a breeze, especially when you’re armed with the versatile text() function. This little gem allows you to add custom text to your plots, enabling you to annotate and highlight essential details. Let’s dive into the world of text() and uncover its syntax and potential through some hands-on examples.

I’m also a big fan of geom_text_repel() in ggplot2’s ggrepel library. It is by no means perfect but it does do a good job of not overlapping important visual features like plotted lines.

Comments closed

Learning about Data in R with str()

Steven Sanderson explains the value of the str() function:

In a nutshell, str() stands for “structure” and offers a concise summary of the structure of an R object. It presents essential details about the object, including its data type, dimensions, and the first few values. By providing an overview of your data, str() allows you to grasp the fundamentals at a glance and proceed with a clearer understanding of what you’re working with.

str() is a really useful function and people who develop objects in R thoughtfully can pack a lot of useful data into the one call.

Comments closed

Simplifying Nested Lists and Vectors in R

Steven Sanderson simplifies things:

Today, we’re diving deep into the incredible world of R programming to explore the often-overlooked but extremely handy unlist() function. If you’ve ever found yourself dealing with complex nested lists or vectors, this little gem can be a lifesaver. The unlist() function is like a magician that simplifies your data structures, making them more manageable and easier to work with. Let’s unlock its magic together!

Click through to see how it works, including explanation and examples.

Comments closed

ML Model Interactions and hstats

Michael Mayer has a new R package for us:

This post is mainly about the third approach. Its beauty is that we get information about all interactions. The downside: it is as good/bad as partial dependence functions. And: the statistics are computationally very expensive to compute (of order n^2).

Different R packages offer some of these H-statistics, including {iml}, {gbm}, {flashlight}, and {vivid}. They all have their limitations. This is why I wrote the new R package {hstats}:

Click through for an overview of the package and an example of how it works.

Comments closed

Set Intersection of Vectors in R

Steven Sanderson performs set intersection of vectors:

Welcome to another exciting blog post where we delve into the world of R programming. Today, we’ll be discussing the intersect() function, a handy tool that helps us find the common elements shared between two or more vectors in R. Whether you’re a seasoned R programmer or just starting your journey, this function is sure to become a valuable addition to your toolkit.

Click through to see how the function works.

Comments closed

Cumulative Means in R

Steven Sanderson performs a moving average:

The cumulative mean, also known as the running mean or moving average, provides us with a dynamic view of how the average value of a dataset changes as new observations are added incrementally. It is an invaluable tool in time-series analysis, trend identification, and smoothing noisy data.

Imagine you have a series of numeric values, and you want to find the average of the first observation, then the average of the first two observations, followed by the average of the first three, and so on. This iterative process generates the cumulative mean, painting a picture of how the data behaves over time.

Often times, we care about the moving average over a specific window, such as the last n periods. This particular post covers the moving average over the entire set of data.

Comments closed