R – Page 62 – Curated SQL

Understanding Skewness and Kurtosis

Published 2020-11-10 by Kevin Feasel

George Pipis explains two key concepts of a distribution:

Most commonly a distribution is described by its mean and variance which are the first and second moments respectively. Another less common measures are the skewness (third moment) and the kurtosis (fourth moment). Today, we will try to give a brief explanation of these measures and we will show how we can calculate them in R.

Click through for an explanation. H/T R-Bloggers

Comments closed

GIS Capabilities in R

Published 2020-11-05 by Kevin Feasel

Lionel Hertzog shows off spatial capabilities in R:

All of these operations follow the same logic, st_operation(A, B) checks for each combinations of the geometries in A and B whether A operation B is true or false. For instance st_within(A, B) checks whether the geometries in A are within B, this is similar to st_contains(B, A), the difference between the two being the shape of the returned object. If A has n geometries and B has m, st_contains(B, A) returns a list of length m where each elements contains the row IDs (numbers between 1 and n) of the geometries in A satisfying the operation. By using sparse=FALSE the functions returns matrices, like st_within(A, B, sparse=FALSE) returns a n x m matrix, st_within(B, A, sparse=FALSE) returns a m x n matrix. Note that running st_operation(A, A) checks the operation between all geometries of the object, so returning a n x n matrix.

Click through for part 1 of the series.

Comments closed

Understanding How GPS Works

Published 2020-11-03 by Kevin Feasel

Holger von Jouanne-Diedrich walks us through the basics of global positioning:

Last week, I showed you a method of how to find the fastest path from A to B: Finding the Shortest Path with Dijkstra’s Algorithm. To make use of that, we need a method to determine our position at any point in time.
For that matter, many devices use the so-called Global Positioning System (GPS). If you want to understand how it works and do some simple calculations in R, read on!

Do read the whole thing; the explanation is laid out really well.

Comments closed

Full Moon Finder in R

Published 2020-10-30 by Kevin Feasel

Tomaz Kastrun has a not-so-useless function:

The full moon function, or should we call it fool moon – due to it’s simplistic and approximate nature, calculates the the difference between the date (only date, no time, no long/lat coordinates) and Julian constant. Should you be using a different calendar, don’t run the function, just look out the window.
The function is written based on generalized equation for julian day numbers and months. Another one could be to calculate RMSE of the predicted values and realization of lunar behavior (lunatic start time). In this case – reversed engineering – you would use the the approximate date/time for the first new moon after that date if the synod period was constant. This number than obtained is only empirically proven by recursively solving for the new “possible date/time” of lunar behavior and calculate the prediction error. In order to minimize the RMSE value of the difference between the full moon dates/times predicted formula and the dates/times for the full moon over the next 10 years you get something like this.

Click through for the function as well as sound advice if it’s not a full moon.

Comments closed

Working with Network Graphs in R

Published 2020-10-28 by Kevin Feasel

John MacKintosh shows us the visNetwork package:

I’ve long been hoping for a reason to have to devote time to learning how to produce network plots. In my world, where bar and line charts reign supreme (with heatmaps and waffle charts thrown in occasionally) it is nice to be able to develop a new visualisation.
I’ve been wanting to produce a network plot for some time. But, the data structure, with its nodes and edges, and seeming lack of any identifiable characteristics, has meant it has not been hugely far up my agenda, or at least, never far up enough to make me learn more about it.

Click through for an example of where a network diagram can work out. H/T R-Bloggers

Comments closed

Visualizing Decision Trees in R

Published 2020-10-21 by Kevin Feasel

Sebastian Sauer shows us a couple techniques for plotting decision trees:

The recursive partitioning plot was designed by Grant McDermott. The tree plot is from the rpart.plot package written by Stephen Milborrow.

Click through for code and illustrations.

Comments closed

Shortest Path Calculations with Dijkstra’s Algorithm

Published 2020-10-14 by Kevin Feasel

Holger von Jouanne-Diedrich takes us through Dijkstra’s algorithm for shortest path calculations:

This post is partly based on this essay Python Patterns – Implementing Graphs, the example is from the German book “Das Geheimnis des kürzesten Weges” (“The secret of the shortest path”) by my colleague Professor Gritzmann and Dr. Brandenberg. For finding the most elegant way to convert data frames into igraph-objects I got help (once again!) from the wonderful R community over at StackOverflow.
Dijkstra’s algorithm is a recursive algorithm. If you are not familiar with recursion you might want to read my post To understand Recursion you have to understand Recursion… first.

Click through for an implementation in R.

Comments closed

Sliding Windows in R

Published 2020-10-13 by Kevin Feasel

Bryan Shalloway shows off some new functionality in the rsample package:

For some problems you may want to take a traditional regression or classification based approach while still accounting for the date/time-sensitive components of your data. In this post I will use the tidymodels suite of packages to:
– build lag based and non-lag based features
– set-up appropriate time series cross-validation windows
– evaluate performance of linear regression and random forest models on a regression problem
For my example I will use data from Wake County food inspections. I will try to predict the SCORE for upcoming restaurant food inspections.

Click through to see it in action.

Comments closed

Gussying Up R Tables in GitHub

Published 2020-10-13 by Kevin Feasel

Laura Ellis solves a problem:

One thing I love about performing analysis in .Rmd (R Markdown) files is how easy it is to share your results publicly on GitHub. Create your analysis in the .Rmd file, set your output variant as below, knit to .md format and then add your files to GitHub!
There is only one problem with the .md output: PRETTY TABLES! Most of the pretty tables packages that I like to use, or don’t display all of the formatting, or don’t display at all in .md format.

Click through to see how to solve this, including demonstration videos.

Comments closed

Time Series Forecasting in R

Published 2020-10-02 by Kevin Feasel

Selcuk Disci contrasts a couple of methods for time series forecasting:

It is always hard to find a proper model to forecast time series data. One of the reasons is that models that use time-series data often expose to serial correlation. In this article, we will compare k nearest neighbor (KNN) regression which is a supervised machine learning method, with a more classical and stochastic process, autoregressive integrated moving average (ARIMA).
We will use the monthly prices of refined gold futures(XAUTRY) for one gram in Turkish Lira traded on BIST(Istanbul Stock Exchange) for forecasting. We created the data frame starting from 2013. You can download the relevant excel file from here.

Click through for the demonstration. H/T R-Bloggers.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: R