Press "Enter" to skip to content

Category: R

Diffify Updates

Myles Mitchell celebrates a year of diffify:

We’ve just passed an important milestone for diffify: our app for tracking Python and R package releases has just turned 1 year old! To mark this exciting occasion we are delighted to announce an “anniversary update” featuring numerous quality of life improvements. This post will outline the latest changes and tease at some exciting developments in the works…

Check out these recent changes and a little bit of what’s on the horizon.

Comments closed

Extending a tinyAML and shiny App

Steven Sanderson wraps up a series on shiny and tinyAML. Part 3 extends options for regression:

As data science continues to be a sought-after field, creating a reliable and accurate model is essential. While there are various machine learning algorithms available, the process of selecting the correct algorithm can be complex. The {tidyAML} package, part of the tidymodels suite, offers an easy-to-use, consistent interface for building machine learning models. In this post, we will explore a Shiny application that utilizes tidyAML to build a machine learning model.

Today I have updated the tidyAML shiny app to include the ability to set the parameter of the fast_regression() function .parsnip_fns and this is things like linear_reg.

And part 4 includes classification:

This is a Shiny app for building models using the {tidyAML} which is based on the tidymodels package in R. The app allows you to upload your own data or choose from one of two built-in datasets (mtcars or iris) and select the type of model you want to build (regression or classification).

Let’s take a closer look at the code.

This was an interesting series, for sure.

Comments closed

Building a Model with shiny and tinyAML

Steven Sanderson has a series on using the tidyAML Model Builder. Part 1 builds a simple model:

The first reactive expression, data, reads in the data file uploaded by the user or selects a built-in dataset, depending on which option the user chooses. If the user uploads a file, the read.csv() function is used to read the data file into a data frame. If the user selects a built-in dataset, the get() function is used to retrieve the data frame associated with that dataset. In both cases, the column names of the data frame are used to update the choices in the predictor_col select input, so that the user can select which column to use as the predictor variable.

Part 2 builds on it by adding new regression algorithms:

Yesterday I spoke about building tidymodels models using my package {tidyAML} and {shiny}. I have made an update to it, and will continue to make updates to it this week.

I have added all of the supported engines for regression problems only, NOT classification yet, that will be tomorrow’s work. I will then add a drop down for users to pick which backend function they want to use from {parsnp} like linear_reg().

Comments closed

Knitting R-Markdown Files into Google Docs

Benjamin Smith makes a Google Doc:

RMarkdown is a powerful framework for writing a documents that contain a mixture of text, code, and the output of the code. Popular output formats for RMarkdown Documents (.Rmd) include HTML, PDF and Word Documents. It is also possible to output RMarkdown documents as part of a static website using blogdown package and is (still!) possible to publish RMarkdown documents to WordPress sites as well (like this one)!

Recently, I started to look into the possibility of outputting an .Rmd file as a Google Doc, but I was unable to locate any out-of-box solutions. After looking into the issue I developed a small function that makes it possible!

Click through for that function.

Comments closed

Creating a Clickable Word Cloud with Shiny

Mandy Norrbo builds a word cloud:

Word clouds are a visual representation of text data where words are arranged in a cluster, with the size of each word reflecting its frequency or importance in the data set. Word clouds are a great way of displaying the most prominent topics or keywords in free text data obtained from websites, social media feeds, reviews, articles and more. If you want to learn more about working with unstructured text data, we recommend attending our Text Mining in R course

Usually, a word cloud will be used solely as an output. But what if you wanted to use a word cloud as an input? For example, let’s say we visualised the most common words in reviews for a hotel. Imagine we could then click on a specific word in the word cloud, and it would then show us only the reviews which mention that specific word. Useful, right?

Read on to see how you can create one of these.

Comments closed

R in 10 Minutes

Holger von Jouanne-Diedrich gives us a quick primer on R:

R is a powerful programming language and environment for statistical computing and graphics. In this post, we will provide a quick introduction to R using the famous iris dataset.

We will cover loading data, exploring the dataset, basic data manipulation, and plotting. By the end, you should have a good understanding of how to get started with R, so read on!

Click through for the intro.

Comments closed

Styling Excel Tables in R

Steven Sanderson wants to spice things up:

The styledtable package in R, which allows users to create styled tables in R Markdown documents. The package can help to create tables with various formatting options such as bold text, colored cells, and borders. It also has functionality on how to port these to Excel itself.

The package offers a simple syntax that allows users to specify formatting options using HTML and CSS. The resulting table can be customized by changing the CSS file or by using the ‘styler’ function to apply custom styles to individual cells or rows.

Read on for more information on what the package does and a few examples of how it works.

Comments closed

Comparing Data Visualization in Excel and R

Amieroh Abrahams builds some graphs:

In Excel it is challenging to eye-ball which changes have been made to a graph, especially if these were minor changes. With R (and some easy to use version control systems), you can see exactly which files were changed. Also, in Excel, a user would usually draw a graph on a single Excel document, and if the same graph is required on a different data set, it is common to copy-and-paste a bunch of manipulations and configurations to another document. Such repeated human interaction is prone to introducing errors, as well as consuming a large amount of time. With R we can avoid this by creating functions, which can be used to run the same code on different data sets simply by changing the input, thereby producing reliable outputs and saving us a lot of time.

Click through for the article. One big thing in Excel’s defense that I did not see here was that it’s a lot easier to perform specific story-telling in Excel visuals. For example, highlight just these two data points, or annotate this segment of the visual. You can do those things in ggplot2 but it’s considerably more difficult than “right-click the data point and format.”

Comments closed

Reading Multi-Sheet Excel Files in R

Steven Sanderson does a bit of Excel file reading:

Reading in an Excel file with multiple sheets can be a daunting task, especially for users who are not familiar with the process. In this blog post, we will walk through a sample function that can be used to read in an Excel file with multiple sheets using the R programming language.

Click through for the process, which makes use of the lapply() function and the readxl package.

Comments closed

Testing Performance of File Formats in R

Steven Sanderson performs some tests:

We can save the generated matrix in different file formats using different functions in R. Here are the functions we will use for each file format:

  • CSV: write.csv()
  • RDS: saveRDS()
  • FST: write_fst()
  • Arrow: write_feather()

Steve then has a follow-up around compressed data:

In this post I create a square matrix and then convert it to a data.frame (2,000 rows by 2,000 columns) and then saved it as a gz compressed csv file. The benchmark compares different R packages and functions, including base Rdata.tablevroom, and readr, and measures their relative speeds based on the time it takes to read in the .csv.gz file.

There’s not a direct comparison between the two posts, as the second matrix is larger than the first, though even with that caveat in mind, this post lets you see how much extra processing occurs to gunzip the data before reading it.

Comments closed