Press "Enter" to skip to content

Category: R

Network Shares And SQL Server R Services

John Pertell runs into an issue reading a file on a network share using SQL Server R Services:

What I’m trying to do should be pretty easy. I want to use R code inside a stored procedure to read all the log files created by my weather program and store the results in a database. I also want to read the current monthly file on a regular basis, at least once a day. Once the data is in the database I’ll create some mobile reports with data and charts I can read on my phone. I’ll also be able to use my own data to play with local weather predicting. I thought it would make a pretty cool demo for my R sessions.

However when I run my stored procedure to read the logs I receive an error that there is no such file if I map the share as a drive, or I’m using an invalid parameter if I try to access the share directly.

Read on for code and the specific error message.

Comments closed

Sparklyr On HDInsight

Ali Zaidi has a walkthrough on using sparklyr on HDInsight:

The majority of Spark is written in Scala (~80% of Spark core), which is a functional programming language. Functional programming languages emphasize functional purity (the output only depends on the inputs) and strive to avoid side-effects. One important component of most functional programming languages is their lazy evaluation. While it might seem odd that we would appreciate laziness from our computing tools, lazy evaluation is an effective way of ensuring computations are evaluated in the most efficient manner possible.

Lazy evaluation allows Spark SQL to highly optimize the queries. When a user submits a query to Spark SQL, Spark composes the components of the SQL query into a logical plan. The logical plan is basically a recipe Spark SQL creates in order to evaluate the desired query. Spark SQL then submits the logical plan to its highly optimized engine called Catalyst, which optimizes this plan into a physical plan of action that is executed inside Spark computation engine (a series of coordinating JVMs).

Read on for more description and code.

Comments closed

RStudio 1.0

RStudio has officially hit 1.0:

While RStudio has been an enormously useful IDE for R since day 1, it’s officially been in “beta” status all of this time. But last week, RStudio released the first official production version, RStudio 1.0. Check out that link for the release history of RStudio and all that’s been added to it over the last 6 years, but this release also adds major new functionality, including:

R Tools for Visual Studio is certainly making strides, but RStudio is the gold standard for R IDEs.

Comments closed

Machine Learning With R Q&A

Ginger Grant answers a series of questions about R and machine learning:

Question: Is it possible to run R processes in diffrent boxes other than SQL Server itself for scalability reasons?

You have the option of installing the R Server on another server. Just keep in mind that you do have to account for the additional overhead of moving all the data over the network, which needs to weigh in on your decision to move processing to a different server.

Click through for plenty more questions and answers.

Comments closed

Benford’s Law

Tomaz Kastrun is starting a series on fraud analysis and starts with Benford’s Law:

One of the samples Microsoft provided with release of new SQL Server 2016 was using simple logic of Benford’s law. This law works great with naturally occurring numbers and can be applied across any kind of problem. By naturally occurring, it is meant a number that is not generated generically such as a page number in a book, incremented number in your SQL Table, sequence number of any kind, but numbers that are occurring irrespective from each other, in nature (length or width of trees, mountains, rivers), length of the roads in the cities, addresses in your home town, city/country populations, etc. The law calculates the log distribution of numbers from 1 to 9 and stipulates that number one will occur 30% of times, number two will occur 17% of time, number three will occur 12% of the time and so on. Randomly generated numbers will most certainly generate distribution for each number from 1 to 9 with probability of 1/9. It might also not work with restrictions; for example height expressed in inches will surely not produce Benford function. My height is 188 which is 74 inches or 6ft2. All three numbers will not generate correct distribution, even though height is natural phenomena.

Tomaz includes SQL Server R Services code, so check it out.

Comments closed

R-Hub

David Smith discusses a new service to test packages on multiple platforms:

If you’re developing a package for R to share with others — on CRAN, say — you’ll want to make sure it works for others. That means testing it on various platforms (Windows, Mac, Linux, and all the versions thereof), and on various versions of R (current, past, and future). But it’s likely you only have access to one platform, and installing and managing multiple R versions can be a pain.

R-hub, the online package-building service now in public beta, aims to solve this problem by making it easy to build and test your package on a variety of platforms and R versions. Using the rhub R package, you can with a single command upload your package to the cloud-based R-hub service, and build and test your package on the current, prior, and in-development versions of R, using any or all of these platforms

This looks like an interesting service for package developers and companies with a broad distribution of R installations.

Comments closed

R Graph Gallery

David Smith points out the new R Graph Gallery:

Once upon a time, there was the original R Graph Gallery, by Romain François. Sadly, it’s been unavailable for several years. Now there’s a new R Graph Gallery to fill the void, created by Yan Holtz. It contains more than 200 data visualizations categorized by type, along with the R code that created them.

You can browse the gallery by types of chart (boxplots, maps, histograms, interactive charts, 3-D charts, etc), or search the chart descriptions. Once you’ve found a chart you like, you can admire it in the gallery (and interact with it, if possible), and also find the R code which you can adapt for your own use. Some entries even include mini-tutorials describing how the chart was made. You can even submit your own graph, if you’d like to have it displayed in the gallery as well.

Looks like a good place to go to get some inspiration.

Comments closed

Deploy SQL Server R Services Without Internet Access

Arvind Shyamsundar shows how to install SQL Server R Services on a machine without internet access:

When deploying SQL Server R Services, it is important to note that the setup components for SQL Server do not include the Microsoft R Open and Microsoft R Server components. Those ‘R Components’ (as we will refer to them later in this post) are provided as separate downloadable components. SQL Server will automatically download these when executed on computer which is connected to the Internet. But in cases where setup is done on a computer without Internet access (quite typical of many SQL Server deployments) we need to handle things differently. There is a documented process for doing this. But even with the documentation, we still had some customers with questions on the process.

Inspired by those customer engagements, this blog post walks through the process of setting up SQL Server R Services in environments without Internet access. We walk through a number of scenarios, right from the very basic scenario to the more complex ones involving unattended and ‘smart setup’.

This is a nice walkthrough.  I wanted to highlight a link at the end showing how to create a local repository so you can install packages as well.

Comments closed

RTVS 0.5

David Smith notes that R Tools for Visual Studio has hit version 0.5:

RTVS also makes it easy to run R code as a SQL Server 2016 stored procedure. (This is a great way to make share the results of R code with other database users while making use of the power of the database for R computations.) The new SQL R Stored Procedure file works with SQL Server R Services to create a stored procedure that embeds R code you create, edit and test within RTVS. This greatly simplifies the process of running R code within SQL Server 2016 as you can see below:

The RTVS team is making good progress.  If you passed on RTVS early on, it might be time to take another look.

Comments closed

Sparklyr On EMR

Tom Zeng shows how to use sparklyr on Amazon ElasticMapReduce:

The recently released sparklyr package by RStudio has made processing big data in R a lot easier. sparklyr is an R interface to Spark that allows users to use Spark as the backend for dplyr, one of the most popular data manipulation packages. sparklyr provides interfaces to Spark packages and also allows users to query data in Spark using SQL and develop extensions for the full Spark API.

You can also install sparklyr locally and point to a Spark cluster.

Comments closed