Press "Enter" to skip to content

Category: R

Microsoft R Server 9.0

David Smith reports that Microsoft R Server 9.0 is now available:

Microsoft R Server 9.0, Microsoft’s R distribution with added big-data, in-database, and integration capabilities, was released today and is now available for download to MSDN subscribers. This latest release is built on Microsoft R Open 3.3.2, and adds new machine-learning capabilities, new ways to integrate R into applications, and additional big-data support for Spark 2.0.

There’s also a new version of Microsoft R Client and Microsoft R Open.

Comments closed

R + Power Query

Ryan Wade makes his argument that R can be more powerful than M inside Power Query:

I want to leave you with two more things. If you look at the trade balance data set you will notice that it is not in a good format for data analysis. Here is a link to the file if you want to take a closer look. When you are doing data analysis you want your data to be in a “tidy” format. A “tidy” format means that each column represents a variable and each row represents an observation. To make this data set “tidy” you need to reformat the data into the following format: Country, Year, Trade Balance, Exports, and Imports.

This was an interesting example.

Comments closed

Multivariate Analysis In R

Mala Mahadevan looks at using R to describe data sets with two explanatory variables:

From the plot we can see that type 3 trees have the smallest circumference while type 4 have the largest, with type 2 close to type 4. We can also see that type 1 trees have the thinnest dispersion of circumference while type 4 has the highest, closely followed by type 2.  We can also see that there are no significant outliers in this data.

Understanding whether variables are categorical or continuous is vital to understanding what you can and should do with them.

Comments closed

Custom R Visuals In Power BI

Ginger Grant notes that there are R-powered custom visuals for Power BI:

Interacting with R visuals works differently than with other report visualizations as you cannot click on elements within the visualization and filter other items on the page. Other visuals on the page will filter the data contained within the R visual. For example, let’s say my report contains a total field, a slicer which contains years and a correlation plot which contains products. If the slicker is changed to select a year, total field and the data within the R visual will change to reflect that. If on the other hand, I choose to click on the R visual to select one of the product categories, the total field will not change and the R visual will not change. The R visual’s appearance will not change in any way.

Read on for more.

Comments closed

Data Wrangling: R Versus M

Ryan Wade argues that R is a better language choice for working with data in Power BI than M:

Now let’s do something that I think is pretty slick. Let’s create a data set that combines the home games of the Pacers (IND) and the home games of the Hawks (ATL). Given the naming convention used by the files we will have to identify the files in our working directory that starts with an eight numeric digits > then a period > then a 3 character team abbreviation for the away team > then either “ATL” or “IND” > then finally “.csv”. We can create a regular expression to find the files that matches that pattern. I did so in the code below:

I’m interested in catching the rest of the series.  This is a controversial statement that I’m not entirely sold on yet, but Ryan does set the stage for his full argument.

Comments closed

Tabulizer

Troy Walters uses the Tabulizer package to extract tables from a PDF and turn them into an R matrices or data frames:

Next we will use the extract_tables() function from tabulizer. First, I specify the url of the pdf file from which I want to extract a table. This pdf link includes the most recent data, covering the period from July 1, 2016 to November 25, 2016. I am using the default parameters for extract_tables. These are guess and method. I’ll leave guess set to TRUE, which tells tabulizer that we want it to figure out the locations of the tables on its own. We could set this to FALSE if we want to have more granular control, but for this application we don’t need to. We leave the method argument set to “matrix”, which will return a list of matrices (one for each pdf page). This could also be set to return data frames instead.

This is nice.  I have to imagine it only works for text-based PDFs and not ones which are generated from a series of images.

Comments closed

Solving The German Tank Problem

Frank Portman shows how to figure out how many taxicabs—or tanks—there are:

For the uninitiated, the Taxicab / Germany Tank problem is as follows:

Viewing a city from the train, you see a taxi numbered x. Assuming taxicabs are consecutively numbered, how many taxicabs are in the city?

This was also applied to counting German tanks in World War II to know when/if to attack. Statstical methods ended up being accurate within a few tanks (on a scale of 200-300) while “intelligence” (unintelligence) operations overestimated numbers about 6-7x. Read the full details on Wikipedia here (and donate while you’re over there).

Click through for the solution and how to implement it in R.

Comments closed

Pie Charts

Peter Ellis defends pie charts under very specific circumstances:

The usual response from statisticians and data professionals to pie charts ranges from lofty disdain to outright snobbery. But sometimes I think they’re the right tool for communication with a particular audience. Like others I was struck by this image from New Zealand news site stuff.co.nz showing that nearly half the earthquake energy of the past six years came in one day (last Sunday night, and the shaking continues by the way). Pie charts work well when the main impression of relative proportions to the whole is obvious, and fine comparisons aren’t needed.

Here’s my own version of the graphic. I polished this up during a break while working at home due to the office being shut for earthquake-related reasons:

Consider me in the lofty disdain camp.  That said, this is probably the best case scenario for a pie chart:  when looking at relative percentage of one dominant element versus the remaining set.

Comments closed

Two-Way T Tests

Mala Mahadevan shows how to write a two-way T test in R and T-SQL:

I can do the same calculation of T value using T-SQL. I cannot calculate p value from TSQL as that comes from a table, but it is possible to look it up. I imported the set of values into a table called WalkingSteps with two columns, walkerAsteps and walkerBsteps. For doing the math on T value the formula stated here may be useful. My T-SQL code is as below

The R code is a bit shorter, although the T-SQL code isn’t bad either.

Comments closed

R Visuals In Power BI

Ginger Grant discusses how to display R visuals in Power BI:

I hope that some day that this list becomes much longer, but it is a good start. If your company has lots R visuals and you wish to migrate them to Power BI, chances are some of the libraries you are using are not here. If you are interested in having your library added to the list of 352, go to the Ideas page of Power BI and request that your library be added, as Microsoft I know looks at this page to determine what to release in the future. Someone has requested that igraph be added, and since it hasn’t received a lot of votes yet (hint) it is probably low on the priority list.

Even so, this list does cover a lot of the most commonly used packages.

Comments closed