Press "Enter" to skip to content

Category: R

Plotting a Subset of Data in R

Steven Sanderson doesn’t need all of those data points:

Data visualization is a powerful tool for gaining insights from your data. In R, you have a plethora of libraries and functions at your disposal to create stunning and informative plots. One common task is to plot a subset of your data, which allows you to focus on specific aspects or trends within your dataset. In this blog post, we’ll explore various techniques to plot subsets of data in R, and I’ll explain each step in simple terms. Don’t worry if you’re new to R – by the end of this post, you’ll be equipped to create customized plots with ease!

Click through for several techniques for subsetting data, as well as reasons why you might want to do it.

Comments closed

Finding Omitted Variables in Logistic Regression

John Mount picks up on a prior post:

For this note, let’s work out how to directly try and overcome the omitted variable bias by solving for the hidden or unobserved detailed data. We will work our example in R. We will derive some deep results out of a simple set-up. We show how to “un-marginalize” or “un-summarize” data.

This is an interesting dive into a common problem, and something which we can easily work around in linear regression, but not in logistic regression.

Comments closed

Building a Weierstrass Function in R

Tomaz Kastrun won’t let you take a derivative:

Coming from the simple sine function (remember of Fourier series), German mathematician Karl Weierstrass became the first to publish an example of a continuous, nowhere
differentiable function
. Weierstrass function (originally defined as a Fourier series) was the first instance in which the idea that a continuous function must be differentiable was introduced. This is an example of a fractal in a function (known as a fractal function) and also another of pathological functions (runs counter to some intuition).

Click through for an example of this in R.

Comments closed

Appropriate Uses of Jitter in Graphs

Steven Sanderson shakes things up:

As an R programmer, one of the most useful functions to know is the jitter function. The jitter function is used to add random noise to a numeric vector, which can be helpful when visualizing data in a scatterplot. By using the jitter function, we can get a better picture of the true underlying relationship between two variables in a dataset.

Read on to get an idea of how to use jitter, though I recommend making it very clear to chart viewers that you are, in fact, using jitter, as it can be easy to misinterpret the jitter as actual value locations.

Comments closed

Kernel Density Plots in R

Steven Sanderson explains one common type of plot in R:

Kernel Density Plots are a type of plot that displays the distribution of values in a dataset using one continuous curve. They are similar to histograms, but they are even better at displaying the shape of a distribution since they aren’t affected by the number of bins used in the histogram. In this blog post, we will discuss what Kernel Density Plots are in simple terms, what they are useful for, and show several examples using both base R and ggplot2.

Read on to learn more, including how to generate these in base R, ggplot2, and with the tidy_density package.

Comments closed

Random Number Generation in R

Adrian Tam rolls the dice:

Whether working on a machine learning project, a simulation, or other models, you need to generate random numbers in your code. R as a programming language, has several functions for random number generation. In this post, you will learn about them and see how they can be used in a larger program. Specifically, you will learn

  • How to generate Gaussian random numbers into a vector
  • How to generate uniform random numbers
  • How to manipulate random vectors and random matrices

And, of course, these are pseudo-random numbers because we’re still dealing with computers and random seeds, after all.

Comments closed

Interesting R Functions for Package Dependencies and File Analysis

Maelle Salmon shows off a few interesting functions:

How does this package depend on this other package? pak::pkg_deps_explain()

The pak package by Gábor Csárdi makes installing packages easier. If I need to start working on a package, I clone it, then run pak::pak() to install and update its dependencies. It’s a “convenience function” that is convenient for sure! Bye bye remotes::install_deps().

Read on for an example of this, as well as details on two other functions in different packages. H/T R-Bloggers.

Comments closed

Building Correlation Heatmaps in R

Steven Sanderson shows two packages for building heatmaps in R:

Data visualization is a powerful tool for understanding the relationships between variables in a dataset. One of the most common and insightful ways to visualize correlations is through heatmaps. In this blog post, we’ll dive into the world of correlation heatmaps using R, using the mtcars and iris datasets as examples. By the end of this post, you’ll be equipped to create informative correlation heatmaps on your own.

Read on to see how to build heatmaps with the corrplot and ggcorrplot packages.

Comments closed

Returning Matrix Elements in Spiral Order in R

Tomaz Kastrun forgot to remove The Club from his REPL:

Another one from the Leetcode challenge. This time, get the elements (single values) from the matrix in a spiral order with a starting position of [1,1].

So, the basic idea is to retrieve a vector of elements from a matrix in the following order:

Probably not something you’d use with any frequency, but it’s a fun way to learn how to operate within matrices.

Comments closed