Press "Enter" to skip to content

Category: R

Solving The German Tank Problem

Frank Portman shows how to figure out how many taxicabs—or tanks—there are:

For the uninitiated, the Taxicab / Germany Tank problem is as follows:

Viewing a city from the train, you see a taxi numbered x. Assuming taxicabs are consecutively numbered, how many taxicabs are in the city?

This was also applied to counting German tanks in World War II to know when/if to attack. Statstical methods ended up being accurate within a few tanks (on a scale of 200-300) while “intelligence” (unintelligence) operations overestimated numbers about 6-7x. Read the full details on Wikipedia here (and donate while you’re over there).

Click through for the solution and how to implement it in R.

Comments closed

Pie Charts

Peter Ellis defends pie charts under very specific circumstances:

The usual response from statisticians and data professionals to pie charts ranges from lofty disdain to outright snobbery. But sometimes I think they’re the right tool for communication with a particular audience. Like others I was struck by this image from New Zealand news site stuff.co.nz showing that nearly half the earthquake energy of the past six years came in one day (last Sunday night, and the shaking continues by the way). Pie charts work well when the main impression of relative proportions to the whole is obvious, and fine comparisons aren’t needed.

Here’s my own version of the graphic. I polished this up during a break while working at home due to the office being shut for earthquake-related reasons:

Consider me in the lofty disdain camp.  That said, this is probably the best case scenario for a pie chart:  when looking at relative percentage of one dominant element versus the remaining set.

Comments closed

Two-Way T Tests

Mala Mahadevan shows how to write a two-way T test in R and T-SQL:

I can do the same calculation of T value using T-SQL. I cannot calculate p value from TSQL as that comes from a table, but it is possible to look it up. I imported the set of values into a table called WalkingSteps with two columns, walkerAsteps and walkerBsteps. For doing the math on T value the formula stated here may be useful. My T-SQL code is as below

The R code is a bit shorter, although the T-SQL code isn’t bad either.

Comments closed

R Visuals In Power BI

Ginger Grant discusses how to display R visuals in Power BI:

I hope that some day that this list becomes much longer, but it is a good start. If your company has lots R visuals and you wish to migrate them to Power BI, chances are some of the libraries you are using are not here. If you are interested in having your library added to the list of 352, go to the Ideas page of Power BI and request that your library be added, as Microsoft I know looks at this page to determine what to release in the future. Someone has requested that igraph be added, and since it hasn’t received a lot of votes yet (hint) it is probably low on the priority list.

Even so, this list does cover a lot of the most commonly used packages.

Comments closed

Calling Cognitive Services With R

David Smith has written a go-to guide for connecting to Azure Cognitive Services using R:

There’s no official R package (yet!) for calling Cognitive Services APIs. But since every Cognitive Service API is just a standard REST API, we can use the httr package to call the API. Input and output is standard JSON, which we can create and extract using the jsonlite package.

(There’s also an independent R interface to the text APIs. And there are already Python SDKs for many of the services, including the Face API.)

This is also useful for other REST APIs for times when there isn’t already a pre-built package to do most of the translation work for you.

Comments closed

Graphing Row Counts With R

I look at one use of R for DBAs:

I have a client data warehouse which holds daily rollups of revenue and cost for customers.  We’ve had some issues with the warehouse lately where data was not getting loaded due to system errors and timeouts, and our services team gave me a list of some customers who had gaps in their data due to persistent processing failures.  I figured out the root cause behind this (which will show up as tomorrow’s post), but I wanted to make sure that we filled in all of the gaps.

My obvious solution is to write a T-SQL query, getting some basic information by day for each customer.  I could scan through that result set, but the problem is that people aren’t great at reading tables of numbers; they do much better looking at pictures.  This is where R comes into play.

Click through for the code and a walkthrough of what each line is doing.

Comments closed

SQL Server R Service Users

John Pertell shows how to figure out which user account is running SQL Server R Services code:

You’re not running as yourself, even though that’s the account you signed into SSMS as.

You’re not running under the server account that SQL or SQL Launchpad run under.

You’re running as a new account created when you installed SQL R Service In Database for the purpose of running R code.

John also looks at a couple ways of showing which user is running this code and notes that this solves his file share issue.

Comments closed

Network Shares And SQL Server R Services

John Pertell runs into an issue reading a file on a network share using SQL Server R Services:

What I’m trying to do should be pretty easy. I want to use R code inside a stored procedure to read all the log files created by my weather program and store the results in a database. I also want to read the current monthly file on a regular basis, at least once a day. Once the data is in the database I’ll create some mobile reports with data and charts I can read on my phone. I’ll also be able to use my own data to play with local weather predicting. I thought it would make a pretty cool demo for my R sessions.

However when I run my stored procedure to read the logs I receive an error that there is no such file if I map the share as a drive, or I’m using an invalid parameter if I try to access the share directly.

Read on for code and the specific error message.

Comments closed

Sparklyr On HDInsight

Ali Zaidi has a walkthrough on using sparklyr on HDInsight:

The majority of Spark is written in Scala (~80% of Spark core), which is a functional programming language. Functional programming languages emphasize functional purity (the output only depends on the inputs) and strive to avoid side-effects. One important component of most functional programming languages is their lazy evaluation. While it might seem odd that we would appreciate laziness from our computing tools, lazy evaluation is an effective way of ensuring computations are evaluated in the most efficient manner possible.

Lazy evaluation allows Spark SQL to highly optimize the queries. When a user submits a query to Spark SQL, Spark composes the components of the SQL query into a logical plan. The logical plan is basically a recipe Spark SQL creates in order to evaluate the desired query. Spark SQL then submits the logical plan to its highly optimized engine called Catalyst, which optimizes this plan into a physical plan of action that is executed inside Spark computation engine (a series of coordinating JVMs).

Read on for more description and code.

Comments closed

RStudio 1.0

RStudio has officially hit 1.0:

While RStudio has been an enormously useful IDE for R since day 1, it’s officially been in “beta” status all of this time. But last week, RStudio released the first official production version, RStudio 1.0. Check out that link for the release history of RStudio and all that’s been added to it over the last 6 years, but this release also adds major new functionality, including:

R Tools for Visual Studio is certainly making strides, but RStudio is the gold standard for R IDEs.

Comments closed