Press "Enter" to skip to content

Category: R

Calling Cognitive Services With R

David Smith has written a go-to guide for connecting to Azure Cognitive Services using R:

There’s no official R package (yet!) for calling Cognitive Services APIs. But since every Cognitive Service API is just a standard REST API, we can use the httr package to call the API. Input and output is standard JSON, which we can create and extract using the jsonlite package.

(There’s also an independent R interface to the text APIs. And there are already Python SDKs for many of the services, including the Face API.)

This is also useful for other REST APIs for times when there isn’t already a pre-built package to do most of the translation work for you.

Comments closed

Graphing Row Counts With R

I look at one use of R for DBAs:

I have a client data warehouse which holds daily rollups of revenue and cost for customers.  We’ve had some issues with the warehouse lately where data was not getting loaded due to system errors and timeouts, and our services team gave me a list of some customers who had gaps in their data due to persistent processing failures.  I figured out the root cause behind this (which will show up as tomorrow’s post), but I wanted to make sure that we filled in all of the gaps.

My obvious solution is to write a T-SQL query, getting some basic information by day for each customer.  I could scan through that result set, but the problem is that people aren’t great at reading tables of numbers; they do much better looking at pictures.  This is where R comes into play.

Click through for the code and a walkthrough of what each line is doing.

Comments closed

SQL Server R Service Users

John Pertell shows how to figure out which user account is running SQL Server R Services code:

You’re not running as yourself, even though that’s the account you signed into SSMS as.

You’re not running under the server account that SQL or SQL Launchpad run under.

You’re running as a new account created when you installed SQL R Service In Database for the purpose of running R code.

John also looks at a couple ways of showing which user is running this code and notes that this solves his file share issue.

Comments closed

Network Shares And SQL Server R Services

John Pertell runs into an issue reading a file on a network share using SQL Server R Services:

What I’m trying to do should be pretty easy. I want to use R code inside a stored procedure to read all the log files created by my weather program and store the results in a database. I also want to read the current monthly file on a regular basis, at least once a day. Once the data is in the database I’ll create some mobile reports with data and charts I can read on my phone. I’ll also be able to use my own data to play with local weather predicting. I thought it would make a pretty cool demo for my R sessions.

However when I run my stored procedure to read the logs I receive an error that there is no such file if I map the share as a drive, or I’m using an invalid parameter if I try to access the share directly.

Read on for code and the specific error message.

Comments closed

Sparklyr On HDInsight

Ali Zaidi has a walkthrough on using sparklyr on HDInsight:

The majority of Spark is written in Scala (~80% of Spark core), which is a functional programming language. Functional programming languages emphasize functional purity (the output only depends on the inputs) and strive to avoid side-effects. One important component of most functional programming languages is their lazy evaluation. While it might seem odd that we would appreciate laziness from our computing tools, lazy evaluation is an effective way of ensuring computations are evaluated in the most efficient manner possible.

Lazy evaluation allows Spark SQL to highly optimize the queries. When a user submits a query to Spark SQL, Spark composes the components of the SQL query into a logical plan. The logical plan is basically a recipe Spark SQL creates in order to evaluate the desired query. Spark SQL then submits the logical plan to its highly optimized engine called Catalyst, which optimizes this plan into a physical plan of action that is executed inside Spark computation engine (a series of coordinating JVMs).

Read on for more description and code.

Comments closed

RStudio 1.0

RStudio has officially hit 1.0:

While RStudio has been an enormously useful IDE for R since day 1, it’s officially been in “beta” status all of this time. But last week, RStudio released the first official production version, RStudio 1.0. Check out that link for the release history of RStudio and all that’s been added to it over the last 6 years, but this release also adds major new functionality, including:

R Tools for Visual Studio is certainly making strides, but RStudio is the gold standard for R IDEs.

Comments closed

Machine Learning With R Q&A

Ginger Grant answers a series of questions about R and machine learning:

Question: Is it possible to run R processes in diffrent boxes other than SQL Server itself for scalability reasons?

You have the option of installing the R Server on another server. Just keep in mind that you do have to account for the additional overhead of moving all the data over the network, which needs to weigh in on your decision to move processing to a different server.

Click through for plenty more questions and answers.

Comments closed

Benford’s Law

Tomaz Kastrun is starting a series on fraud analysis and starts with Benford’s Law:

One of the samples Microsoft provided with release of new SQL Server 2016 was using simple logic of Benford’s law. This law works great with naturally occurring numbers and can be applied across any kind of problem. By naturally occurring, it is meant a number that is not generated generically such as a page number in a book, incremented number in your SQL Table, sequence number of any kind, but numbers that are occurring irrespective from each other, in nature (length or width of trees, mountains, rivers), length of the roads in the cities, addresses in your home town, city/country populations, etc. The law calculates the log distribution of numbers from 1 to 9 and stipulates that number one will occur 30% of times, number two will occur 17% of time, number three will occur 12% of the time and so on. Randomly generated numbers will most certainly generate distribution for each number from 1 to 9 with probability of 1/9. It might also not work with restrictions; for example height expressed in inches will surely not produce Benford function. My height is 188 which is 74 inches or 6ft2. All three numbers will not generate correct distribution, even though height is natural phenomena.

Tomaz includes SQL Server R Services code, so check it out.

Comments closed

R-Hub

David Smith discusses a new service to test packages on multiple platforms:

If you’re developing a package for R to share with others — on CRAN, say — you’ll want to make sure it works for others. That means testing it on various platforms (Windows, Mac, Linux, and all the versions thereof), and on various versions of R (current, past, and future). But it’s likely you only have access to one platform, and installing and managing multiple R versions can be a pain.

R-hub, the online package-building service now in public beta, aims to solve this problem by making it easy to build and test your package on a variety of platforms and R versions. Using the rhub R package, you can with a single command upload your package to the cloud-based R-hub service, and build and test your package on the current, prior, and in-development versions of R, using any or all of these platforms

This looks like an interesting service for package developers and companies with a broad distribution of R installations.

Comments closed

R Graph Gallery

David Smith points out the new R Graph Gallery:

Once upon a time, there was the original R Graph Gallery, by Romain François. Sadly, it’s been unavailable for several years. Now there’s a new R Graph Gallery to fill the void, created by Yan Holtz. It contains more than 200 data visualizations categorized by type, along with the R code that created them.

You can browse the gallery by types of chart (boxplots, maps, histograms, interactive charts, 3-D charts, etc), or search the chart descriptions. Once you’ve found a chart you like, you can admire it in the gallery (and interact with it, if possible), and also find the R code which you can adapt for your own use. Some entries even include mini-tutorials describing how the chart was made. You can even submit your own graph, if you’d like to have it displayed in the gallery as well.

Looks like a good place to go to get some inspiration.

Comments closed