Press "Enter" to skip to content

Category: R

Tidyverse Updates

Hadley Wickham has two announcements.  First, for a slew of tidyverse packages:

Over the couple of months there have been a bunch of smaller releases to packages in the tidyverse. This includes:

  • forcats 0.2.0, for working with factors.
  • readr 1.1.0, for reading flat-files from disk.
  • stringr 1.2.0, for manipulating strings.
  • tibble 1.3.0, a modern re-imagining of the data frame.

This blog post summarises the most important new features, and points to the full release notes where you can learn more.

Second, a new version of dplyr is coming:

dplyr 0.6.0 is a major release including over 100 bug fixes and improvements. There are three big changes that I want to touch on here:

  • Databases
  • Improved encoding support (particularly for CJK on windows)
  • Tidyeval, a new framework for programming with dplyr

You can see a complete list of changes in the draft release notes.

You can already get a tech preview of the new dplyr if you’re interested in trying it out.

Comments closed

The Basics Of SparkR

Yanbo Liang has an introductory article on what SparkR is and why you might want to use it:

However, data analysis using R is limited by the amount of memory available on a single machine and further as R is single threaded it is often impractical to use R on large datasets. To address R’s scalability issue, the Spark community developed SparkR package which is based on a distributed data frame that enables structured data processing with a syntax familiar to R users. Spark provides distributed processing engine, data source, off-memory data structures. R provides a dynamic environment, interactivity, packages, visualization. SparkR combines the advantages of both Spark and R.

In the following section, we will illustrate how to integrate SparkR with R to solve some typical data science problems from a traditional R users’ perspective.

This is a fairly introductory article, but gives an idea of what SparkR can accomplish.

Comments closed

Basics Of R Plotting

Aman Tsegai shows some basic ways to customize R’s plot function:

We’re going to be using the cars dataset that is built in R. To follow along with real code, here’s an interactive R Notebook. Feel free to copy it and play around with the code as you read along.

So if we were to simply plot the dataset using just the data as the only parameter, it’d look like this:

plot(dataset)

The plot function is great for cases where you don’t much care how the visual looks, and the simplicity is great for throwaway visuals.

Comments closed

R Plots In Power BI

Leila Etaati has a three-part series on displaying R visuals in Power BI.  Part 1 shows how to create a scatter plot:

so in the above picture we can see that we have 3 different fields that has been shown in the chart :highway and city speed in y and x axis. while the car’s cylinder varibale has been shown as different cycle size. However may be you need a bigger cycle to differentiate cylinder with 8 to 4 so we able to do that with add another layer by adding a function name

Part 2 shows how to use facet_grid to show multiple plots in one visual:

now I want to add other layer to this chart. by adding year and car drive option to the chart. To do that first choose year and drv  from data field in power BI. As I have mentioned before, now the dataset variable will  hold data about speed in city, speed in highway, number of cylinder, years of cars and type of drive.

I am going to use another function in the ggplot packages name “facet_grid” that helps me to show the different facet in my scatter chart. in this function, year and drv (driver) will be shown against each other.

Part 3 shows how to place charts on a map in R:

Now I have to merg the data to get the location information from “sPDF” into “ddf”. To do that I am going to use” merge” function. As you can see in below code, first argument is our first dataset “ddf” and the second one is the data on Lat and Lon of location (sPDF). the third and forth columns show the main variables for joining these two dataset as “ddf” (x) is “country” and in the second one “sPDF”  is “Admin”. the result will be stored in “df” dataset

Aside from my strong dislike of bar/pie charts on maps, this is good to know, particularly if there is not a built-in or customer Power BI visual to replicate something you can do in R.

Comments closed

Logging R Scripts

Tomaz Kastrun shows the places where you might be able to track R scripts running on your system:

Extensibility Log will store information about the session but it will not store the R or R environment information or data, just session information and data. Navigate to:

C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\LOG\ExtensibilityLog

to check the content and to see, if there is anything useful for your needs.

It’s not a great answer today.

Comments closed

Microsoft R Open 3.3.3

David Smith reports that Microsoft R Open 3.3.3 is now available:

Microsoft R Open (MRO), Microsoft’s enhanced distribution of open source R, has been upgraded to version 3.3.3, and is now available for download for Windows, Mac, and Linux. This update upgrades the R language engine to R 3.3.3, upgrades the installer, and updates the bundled packages.

R 3.3.3 makes just a few minor fixes compared to R 3.3.2 (see the full list of changes here), so you shouldn’t encounter any compatibility issues when upgrading from MRO 3.3.2. For CRAN packages, MRO 3.3.3 points to CRAN snapshot taken on March 15, 2017 but as always, you can use the built-in checkpoint package to access packages from an earlier date (for compatibility) or a later date (to access new and updated packages).

Click through for more details.  As a side note, CRAN R 3.4 is scheduled for release this month, so given their recent cadence, I’d guess MRO 3.4 to be out late this year.

Comments closed

Using OLS To Fit Rational Functions

Srini Kumar and Bob Horton show how to use the lm function to fit functions using the Pade Approximation:

Now we have a form that lm can work with. We just need to specify a set of inputs that are powers of x (as in a traditional polynomial fit), and a set of inputs that are y times powers of x. This may seem like a strange thing to do, because we are making a model where we would need to know the value of y in order to predict y. But the trick here is that we will not try to use the fitted model to predict anything; we will just take the coefficients out and rearrange them in a function. The fit_pade function below takes a dataframe with x and y values, fits an lm model, and returns a function of x that uses the coefficents from the model to predict y:

The lm function does more than just fit straight lines.

Comments closed

New RTVS Instructions

Ginger Grant has updated her instructions for installing R Tools for Visual Studio and getting R Services to work on SQL Server:

In addition to having an SQL Server 2016 instance with R Server installed, the following components are needed on a client machine

The Comprehensive R Archive Network

RStudio (optional)

Visual Studio 2015 R Tools

This list is a change from the previous list I have provided as RTVS contains an installation of R Client, there is no need to download that as well. You do not need to download Microsoft R Open if you are using R Server either.  Once RTVS is installed, there is a menu option on the R Tools window. Selecting Install R Client from the menu will handle the information. Unfortunately, there is no change to the menu option once R Client is installed, it always looks like you should install it.  To find out if R Client has been installed, look in the Workspaces window.

In other words, fewer dependencies and an easier installation process.  Read the whole thing to avoid RevoScaleR errors in your code post-upgrade.

Comments closed

Generating R Services Stored Procedures From R

David Smith describes sqlrutils, an R function to generate SQL Server R Services stored procedures:

If you’ve created an R function (say, a routine to clean up missing values in a data set, or a function to make forecasts using a machine learning model), and you want to make it easy for DBAs to use it, it’s now possible to publish R functions as a SQL Server 2016 stored procedure. The sqlrutils package provides tools to convert an existing R function to a stored procedure which can then be executed by anyone with authenticated access to the database — even if they don’t know any R.

To use an R function as a stored procedure, you’ll need SQL Server 2016 with R Services installed. You’ll also need to use the sqlrutils package to publish the function as a stored procedure: it’s included with both Microsoft R Client (available free) and Microsoft R Server (included with SQL Server 2016), version 9.0 or later.

Compare this against R Tools for Visual Studio, with which you can generate stored procedures from the IDE.

Comments closed

ggedit 0.2.0

Jonathan Sidi announces ggedit 0.2.0:

ggedit is an R package that is used to facilitate ggplot formatting. With ggedit, R users of all experience levels can easily move from creating ggplots to refining aesthetic details, all while maintaining portability for further reproducible research and collaboration.
ggedit is run from an R console or as a reactive object in any Shiny application. The user inputs a ggplot object or a list of objects. The application populates Bootstrap modals with all of the elements found in each layer, scale, and theme of the ggplot objects. The user can then edit these elements and interact with the plot as changes occur. During editing, a comparison of the script is logged, which can be directly copied and shared. The application output is a nested list containing the edited layers, scales, and themes in both object and script form, so you can apply the edited objects independent of the original plot using regular ggplot2 grammar.

This makes modifying ggplot2 visuals a lot easier for people who aren’t familiar with the concept of aesthetics and layers—like, say, the marketing team or management.

Comments closed