Press "Enter" to skip to content

Category: R

Rendering Ten Million Points With ggplot2

Antonio Sánchez Chinchón shows how to draw Clifford attractors in R:

From a technical point of view, the challenge is creating a data frame with all locations, since it must have 10 milion rows and must be populated sequentially. A very fast way to do it is using Rcpp package. To render the plot I use ggplot, which works quite well. Here you have the code to play with Clifford Attractors if you want:

Click through for the code, as well as sample output images.

Comments closed

Multiple Result Sets With ML Services

Dave Mason figures out how to create multiple result sets with SQL Server ML Services:

Of course for this strategy to work, I’d have to know ahead of time how many data frames/HTML tables there are. Hmmm. Can dynamic T-SQL help me here? If I could find out at run time how many data frames there are, and which ones I may or may not want, then why not? Here’s some R code that reads HTML tables into a variable as a list of data frames(line 8), iterates through the list (starting at line 18), decides if the HTML table has any data in it (lines 21, 24), and adds the HTML table number (the element number in the list) to a different data frame (line 27). The output shows us we would want HTML tables 1, 2, and 4. (Yeah, I really didn’t want #4. But that can be fixed by enhancing the R code to be more selective. Let’s just go with it for now.)

The method is a bit disappointing (and it’s arguably worse for inputs); I do hope the ML Services team can improve upon this experience.

Comments closed

stringr Cheat Sheet

David Smith points out a cheat sheet for dealing with strings in R:

The RStudio team has created another very useful cheat sheet for RWorking with Strings. This cheat sheet provides an example-laden menu of operations you can perform on strings (character verctors) in R using the stringr package. While base R provides a solid set of string manipulation functions, the stringr package functions are simpler, more consistent (making them easy to use with the pipe operator), and more like the Ruby or Python way of handling string operations.

Click through for a link to the PDF.

Comments closed

Multiple Data Sets And SQL Server R Services

Robert Sheldon has a workaround for SQL Server R Services’s limitation of a single input data set:

Despite the ease with which you can run an R script, the sp_execute_external_script stored procedure has an important limitation. You can specify only one T-SQL query when calling the procedure. Of course, you can create a query that joins multiple tables, but this approach might not always work in your circumstances or might not be appropriate for the analytics you’re trying to perform. Fortunately, you can retrieve additional data directly within the R script.

In this article, we look at how to import data from a SQL Server table and from a .csv file. We also cover how to save data to a .csv file as well as insert that data into a SQL Server table. Being able to incorporate additional data sets or save data in different formats provides us with a great deal of flexibility when working with R Services and allows us to take even greater advantage of the many elements available to the R language for data analytics.

Another option is using the rodbc package to connect back to SQL Server to retrieve more data.

Comments closed

Promises And Closures In R

Damian Rodziewicz looks at the new promises package in R:

Citing Joe Cheng, our aim is to:

  1. Execute long-running code asynchronously on separate thread.
  2. Be able to do something with the result (if success) or error (if failure), when the task completes, back on the main R thread.

A promise object represents the eventual result of an async task. A promise is an R6 object that knows:

  1. Whether the task is running, succeeded, or failed

  2. The result (if succeeded) or error (if failed)

This looks pretty exciting.  H/T R-Bloggers

Also, Sebastian Warnholz has a post on promises and closures in case you’re not familiar with the concepts:

Every argument you pass to a function is a promise until the moment R evaluates it. Consider a function g with arguments x and y. Let’s leave out one argument in the function call:

g <- function(x, y) x
g(1)

## [1] 1

R will be forgiving (lazy) until the argument y is actually needed. Until then y exists in the environment of the function call as a ‘name without a value’. Only when R needs to evaluate y a value is searched for. This means that we can pass some non-existent objects as arguments to the function g and R won’t care until the argument is needed in the functions body.

Read the whole thing.  Once again, H/T R-Bloggers

Comments closed

Text Preprocessing With R

Sibanjan Das has started a new series on text mining in R:

Next, we need to preprocess the text to convert it into a format that can be processed for extracting information. It is essential to reduce the size of the feature space before analyzing the text. There are various preprocessing methods that we can use here, such as stop word removal, case folding, stemming, lemmatization, and contraction simplification. However, it is not necessary to apply all of the normalization methods to the text. It depends on the data we retrieve and the kind of analysis to be performed.

The series starts off with a quick description of some preprocessing steps and then building an LDA model to extract key terms from articles.

Comments closed

Could Not Find Function rxSqlUpdateLibPaths

I ran into an error after installing SQL Server 2017:

After installation completed, the DBA enabled SQL Server 2017 Machine Learning Services, but as soon as I tried to run a simple R script, it stalled for about 30 seconds and then I got an error:

Msg 39012, Level 16, State 1, Line 0
Unable to communicate with the runtime for ‘R’ script. Please check the requirements of ‘R’ runtime.
STDERR message(s) from external script:
Error: could not find function “rxSqlUpdateLibPaths”
Execution halted

Click through for the solution.

Comments closed

Making a Shiny Dashboard

Anish Sing Walia walks us through creating a dashboard using Shiny:

Shiny is an amazing R package which lets the R developers and users build amazing web apps using R itself. It lets the R users analyze, visualize and deploy their machine learning models directly in the form of the web app. This package lets you host standalone apps on a webpage or embed them in R markdown documents or build dashboards and various forecasting applications. You can also extend your Shiny apps with CSS themes, htmlwidgets, and JavaScript actions. Shiny lets us write client-side front-end code in R itself and also lets users write server-side script in R itself. More details on this package can be found here.

I recently learned Shiny and started developing a web application using it.And since then I have been in love with it and have been using it in each and every data science and analytics project. The syntax is super easy to understand and there are lots of amazing articles and documentation available for you to learn it and use it. I personally had a background of developing full-stack web applications using HTML, CSS and javascript and other JS based scripting languages so I found the syntax easy.

I keep meaning to learn Shiny and someday I will, just to prove to my intern that she’s not the only one here who can…

1 Comment

Kaggle Data Science Report For 2017

Mark McDonald rounds up a few notebooks covering a recent Kaggle survey:

In 2017 we conducted our first ever extra-large, industry-wide survey to captured the state of data science and machine learning.

As the data science field booms, so has our community. In 2017 we hit a new milestone of reaching over 1M registered data scientists from almost every country in the world. Representing many different backgrounds, skill levels, and professions, we were excited to ask our community a wide range of questions about themselves, their skills, and their path to data science. We asked them everything from “what’s your yearly salary?” to “what’s your favorite data science podcasts?” to “what barriers are faced at work?”, letting us piece together key insights about the people and the trends behind the machine learning models.

Without further ado, we’d love to share everything with you. Over 16,000 responses surveys were submitted, with over 6 full months of aggregated time spent completing it (an average response time of more than 16 minutes).

Click through for a few reports.  Something interesting to me is that the top languages/tools were, in order, Python, R, and SQL.  For the particular market niche that Kaggle competitions fit, that makes a lot of sense:  I tend to like R more for data exploration and data cleansing, but much of that work is already done by the time you get the dataset.

Comments closed

ggplot2 Basics

Bharani Akella has an introduction to ggplot2:

Plot10: Scatter-plot

ggplot(data = mtcars,aes(x=mpg,y=hp,col=factor(cyl)))+geom_point()
  • mpg(miles/galloon) is assigned to the x-axis

  • hp(Horsepower) is assigned to the y-axis

  • factor(cyl) {Number of cylinders} determines the color

  • The geometry used is scatter plot. We can create a scatter plot by using the geom_point() function.

He has a number of similar examples showing several variations on bar, line, and scatterplot charts.

Comments closed