R – Page 56 – Curated SQL

Last Observation Carried Forward in R

Published 2021-03-24 by Kevin Feasel

Nathan Eastwood shows how to perform Last Observation Carried Forward in R:

Real life data is often riddled with missing values – or NAs – where no data value are stored for the variable in observation. Missing data such as this can have a significant effect on the conclusions which can be drawn from the data. For example individuals dropping out of a study or subjects not properly reporting responses. A common solution to this problem is to fill those NA values with the most recent non-NA value prior to it – this is called the Last Observation Carried Forward (LOCF) method.
This blog post will look at how to implement this particular solution using a combination of {dplyr}, {dbplyr} and {sparklyr}.

Click through for the solution.

Comments closed

Generating Random Numbers in R

Published 2021-03-24 by Kevin Feasel

Holger von Jouanne-Diedrich brings the noise:

In data science, we try to find, sometimes well-hidden, patterns (= signal) in often seemingly random data (= noise). Pseudo-Random Number Generators (PRNG) try to do the opposite: hiding a deterministic data generating process (= signal) by making it look like randomness (= noise). If you want to understand some basics behind the scenes of this fascinating topic, read on!

Click through for an explanation of the process.

Comments closed

Creating a Rose Chart in R

Published 2021-03-23 by Kevin Feasel

Neil Saunders takes a look at a classic chart:

I first heard Florence Nightingale and her Geeks Declare War on Death, an episode of the Cautionary Tales podcast, premiered as a special episode of 99% Invisible. It discusses Nightingale’s work as a statistician and in particular, her visualisation of mortality causes in the Crimean War using the famous “rose chart”, or polar area diagram.
I’m sure you’re thinking: how can I explore that using R?

Read on to find out.

Comments closed

knitr Options and Hooks

Published 2021-03-17 by Kevin Feasel

The folks at Jumping Rivers conclude a series:

As with many aspects of programming, when you are working by yourself you can be (slightly) more lax with styles and set-up. However, as you start working in a team, different styles can quickly become a hindrance and lead to errors.
Using {knitr} is no different. When you work on documents with different team members, it’s helpful to have a consistent set of settings. If the default for eval changes, this can easily waste time as you try to track down an error. At Jumping Rivers, we use {knitr} a lot. From our training courses, to providing feedback to clients, to constructing monthly reports on clients infrastructure. The great thing about {knitr} is it’s really easy to customise. The bad thing is that without some care, it’s really easy for every member of the team to have different default options. This proliferation of different default options, means that when we pick up someone else document, mistakes may creep in.

Read on for different options they use to keep things consistent.

Comments closed

Fuzzy Joins in SQL Server using R

Published 2021-03-16 by Kevin Feasel

Rajendra Gupta shows how you can use R in SQL Server Machine Learning Services to perform fuzzy joins:

Suppose you have a web page where users right comments in the text box. You are performing data analysis. However, there are few spelling mistakes, and you want to perform the approximate match or fuzzy lookup in another dataset. Similarly, you have a product catalog database. Your users search for a product; however, they might not type the exact keyword for the product name. Using the fuzzy joins, we can return the user the products with an approximate match to the product names.
SQL Server Machine Learning using R scripts enables you to execute the R language queries inside the SQL Server. In the previous articles, we explored a few use-cases of the machine learning language. In the previous articles, we explored the R scripts for the below topics.

It’s R, so there’s already a package in CRAN for that.

Comments closed

k-gram Language Models in R

Published 2021-03-16 by Kevin Feasel

Valerio Gherardi takes us through the concept of k-grams:

The post is structured as follows: we start by giving a succinct theoretical introduction to kk-gram models. Subsequently, we illustrate how to train a kk-gram model in R using kgrams, and explain how to use the standard perplexity metric for model evaluation or tuning. Finally, we use our trained model to generate some random text at different temperatures.

This goes into some depth on the topic and is worth giving a careful read.

Comments closed

The Basics of k-Means Clustering

Published 2021-03-15 by Kevin Feasel

Nathaniel Schmucker explains some of the principles of k-means clustering:

k-Means is easy to implement. In R, you can use the function kmeans() to quickly deploy an efficient k-Means algorithm. On datasets of reasonable size (thousands of rows), the kmeans function runs in fractions of a second.
k-Means is easy to interpret (in 2 dimensions). If you have two features of your k-Means analysis (e.g., you are grouping by length and width), the result of the k-Means algorithm can be plotted on an xy-coordinate system to show the extent of each cluster. It’s easy to visually inspect the assignment to see if the k-Means analysis returned a meaningful insight. In more dimensions (e.g., length, width, and height) you will need to either create a 3D plot, summarize your features in a table, or find another alternative to describing your analysis. This loses the intuitive power that a 2D k-Means analysis has in convincing you or your audience that your analysis should be trusted. It’s not to say that your analysis is wrong; it simply takes more mental focus to understand what your analysis says.
The k-Means analysis, however, is not always the best choice. k-Means does well on data that naturally falls into spherical clusters. If your data has a different shape (linear, spiral, etc.), k-Means will force clustering into circles, which can result in outputs that defy human expectations. The algorithm is not wrong; we have fed the algorithm data it was never intended to understand.

There’s a lot of depth in this article which makes it really interesting.

Comments closed

Getting Started with data.table

Published 2021-03-09 by Kevin Feasel

Gary Hutson has a primer on data.table:

This example uses the copy data frame we made and uses the organisation code by the type of attendances. I want to then summarise the mean admissions by type and organisation code.
Pivots can be implemented in data.table in the following way:

I’ve never been the biggest fan of the syntax for data.table but the performance is unquestionably there and that makes it worth learning. H/T R-bloggers.

Comments closed

Spring Cleaning Shiny Projects

Published 2021-03-08 by Kevin Feasel

Mirai Solutions has some tips on cleaning up Shiny apps:

How to apply the spring cleaning principles and advanced programming to your Shiny App.
1. Deep breeze and allocate some time
Do not avoid spring cleaning simply because you don’t know where to start from. Prioritize some time for the task and get inspired by our following points.

Click through for advice on tools and processes to make this code easier to understand. H/T R-Bloggers

Comments closed

Including and Resizing External Images in knitr

Published 2021-03-03 by Kevin Feasel

The folks at Jumping Rivers continue a series on knitr and rmarkdown:

In this third post, we’ll look at including eternal images, such as figures and logos in HTML documents. This is relevant for all R markdown files, including fancy things like {bookdown}, {distill} and {pkgdown}. The main difference with the images discussed in this post, is that the image isn’t generated by R. Instead, we’re thinking of something like a photograph. When including an image in your web-page, the two key points are
– What size is your image?
– What’s the size of your HTML/CSS container on your web-page?

Read the whole thing.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: R