R – Page 78 – Curated SQL

Creating R Visuals in Power BI

Published 2019-05-29 by Kevin Feasel

Dave Mason takes us through showing an R-based visual in Power BI:

The R engine isn’t included with the installation of Power BI desktop. I won’t go into detail on this, so just know you’d need to install that separately. I had already installed the R component as part of Machine Learning Services for SQL Server 2017. I also had RStudio installed. Within Power BI desktop, take a moment to click File | Options and settings | Options to open the Options page. Then click R scripting in the list of Global Options. Here you’ll see options to set the R home directory and the desired R IDE.

Click through for the demo.

Comments closed

The structure() Function in R

Published 2019-05-28 by Kevin Feasel

Tomaz Kastrun takes us through the structure() function in R:

Structure() function is a simple, yet powerful function that describes a given object with given attributes. It is part of base R language library, so there is no need to load any additional library. And also, since the function was part of S-Language, it is in the base library from the earlier versions, making it backward or forward compatible.

Read on to see how you can create a matrix or data frame using this function and additional details you can save.

Comments closed

Timing R Function Calls

Published 2019-05-24 by Kevin Feasel

Colin Gillespie shows off an R package for benchmarking:

Of course, it’s more likely that you’ll want to compare more than two things. You can compare as many function calls as you want with mark(), as we’ll demonstrate in the following example. It’s probably more likely that you’ll want to compare these function calls against more than one value. For example, in the digest package there are eight different algorithms. Ranging from the standard md5 to the newer xxhash64 methods. To compare times, we’ll generate n = 20 random character strings of length N = 10,000. This can all be wrapped up in the single function press() function call from the bench package:

Click through for an example involving hashing algorithms.

Comments closed

Exploratory Data Analysis with inspectdf

Published 2019-05-23 by Kevin Feasel

Laura Ellis continues a dive into Exploratory Data Analysis, this time using the inspectdf package:

I like this package because it’s got a lot of functionality and it’s incredibly straightforward to use. In short, it allows you to understand and visualize column types, sizes, values, value imbalance & distributions as well as correlations. Better yet, you can run each of these features for an individual data frame, or compare the differences between two data frames.
I liked the inspectdf package so much that in this blog, I’m going to extend my previous EDA tutorial with an overview of the package.

There are some interesting functions which make EDA easier, so check it out.

Comments closed

MRAN Changes and a Survey

Published 2019-05-23 by Kevin Feasel

David Smith discusses potential changes to MRAN:

As CRAN has grown and changes to packages have become more frequent, maintaining MRAN is an increasingly resource-intensive process. We’re contemplating changes, like changing the frequency of snapshots, or thinning the archive of snapshots that haven’t been used. But before we do that we’d like to hear from the community first. Have you used MRAN snapshots? If so, how are you using them? How many different snapshots have you used, and how often do you change that up? Please leave your feedback at the survey link below by June 14, and we’ll use the feedback we gather in our decision-making process. Responses are anonymous, and we’ll summarize the responses in a future blog post. Thanks in advance!

Please take the survey as well. If you’ve used SQL Server Machine Learning Services (or SQL Server R Services), you’ve used MRAN.

Comments closed

Vectors for Programmers

Published 2019-05-22 by Kevin Feasel

John Mount has a couple of videos available:

We have just released two new free video lectures on vectors from a programmer’s point of view. I am experimenting with what ideas do programmers find interesting about vectors, what concepts do they consider safe starting points, and how to condense and present the material.

Click through for the links, one with Python examples and the other with R examples.

Comments closed

Defining Tidy Data

Published 2019-05-14 by Kevin Feasel

John Mount shares thoughts about the concept of tidy data:

A question is: is such a data set “tidy”? The paper itself claims the above definitions are “Codd’s 3rd normal form.” So, no the above table is not “tidy” under that paper’s definition. The the winner’s date of birth is a fact about the winner alone, and not a fact about the joint row keys (the tournament plus year) as required by the rules of Codd’s 3rd normal form. The critique being: this data presentation does not express the intended data invariant that Al Fredrickson must have the same “Winner Date of Birth” in all rows.

My spin on it is that tidy data is Boyce-Codd Normal Form but may subsequently be denormalized. This may reintroduce violations of 3NF (as in Mount’s example) and sometimes 2NF, but does not change the shape of the variables themselves—that is, a variable still represents a single thing and exists per observation.

Comments closed

Visualizing Earthquake Data

Published 2019-05-14 by Kevin Feasel

Giorgio Garziano continues a series on analyzing earthquake data:

This is the third part of our post series about the exploratory analysis of a publicly available dataset reporting earthquakes and similar events within a specific 30 days time span. In this post, we are going to show static, interactive and animated earthquakes maps of different flavors by using the functionalities provided by a pool of R packages as specifically explained herein below.

Giorgio looks at 9 separate R mapping packages, so you get your money’s worth here.

Comments closed

Changes in R 3.6.0

Published 2019-05-13 by Kevin Feasel

David Smith lays out the major changes in R 3.6.0:

A major update to the open-source R language, R 3.6.0, was released on April 26 and is now available for download for Windows, Mac and Linux. As a major update, it has many new features, user-visible changes and bug fixes. You can read the details in the release announcement, and in this blog post I’ll highlight the most significant ones.

There are some good changes in here.

Comments closed

Horizontal Labels with ggplot

Published 2019-05-07 by Kevin Feasel

Michael Toth shows us how to ensure we use horizontal text labels in ggplot:

There are several things we could do to improve this graph, but in this guide let’s focus on rotating the y-axis label. This simple change will make your graph so much better. That way, people won’t have to tilt their heads like me to understand what’s going on in your graph:

It may not seem like much when you’re creating the visual, but it can make a difference for a viewer.

Comments closed

Category: R