R – Page 134 – Curated SQL

Global Maps In R

Published 2017-02-22 by Kevin Feasel

The folks at Sharp Sight Labs show how to create high-quality map visuals in R:

Maps are great for practicing data visualization. First of all, there’s a lot of data available on places like Wikipedia that you can map.

Moreover, creating maps typically requires several essential skills in combination. Specifically, you commonly need to be able to retrieve the data (e.g., scrape it), mold it into shape, perform a join, and visualize it. Because creating maps requires several skills from data manipulation and data visualization, creating them will be great practice for you.

And if that’s not enough, a good map just looks great. They’re visually compelling.

With that in mind, I want to walk you through the logic of building one step by step.

Read on for a step by step process.

Comments closed

dplyr Basics

Published 2017-02-22 by Kevin Feasel

Gerald Belton shows off the main functions and operators in dplyr:

The pipe operator

The pipe operator is one of the great features of the tidyverse. In base R, you often find yourself calling functions nested within functions nested within… you get the idea. The pipe operator %>% takes the object on the left-hand side, and “pipes” it into the function on the right hand side.

Click through for the rest of the story.

Comments closed

Gloom Indexes

Published 2017-02-22 by Kevin Feasel

David Smith points out an interesting use of R:

Radiohead is known for having some fairly maudlin songs, but of all of their tracks, which is the most depressing? Data scientist and R enthusiast Charlie Thompson ranked all of their tracks according to a “gloom index”, and created the following chart of gloominess for each of the band’s nine studio albums. (Click for the interactive version, crated with with highcharter package for R, which allows you to explore individual tracks.)

Do click through for Charlie’s explanation, including where the numbers come from.

Comments closed

Market Basket Analysis Basics

Published 2017-02-21 by Kevin Feasel

Leila Etaati has an introduction to market basket analysis with R:

For instance, imagine we have below transaction items from a shopping store for last hours,

Customer 1: Salt, pepper, Blue cheese

Customer 2: Blue Cheese, Pasta, Pepper, tomato sauce

Customer 3: Salt, Blue Cheese, Pepperoni, Bacon, egg

Customer 4: water, Pepper, Egg, Salt

we want to know how many times customer purchase pepper and salt together
the support will be : from out four main transactions (4 customers), 2 of them purchased salt and pepper together. so the support will be 2 divided by 4 (all number of transaction.

Basket analysis is one way of building a recommendation engine: if you’re buying butter, cream, and eggs, do you also want to buy sugar?

Comments closed

R 3.4.0 Performance Improvements

Published 2017-02-17 by Kevin Feasel

David Smith discusses performance improvements upcoming in R 3.4.0:

A “just-in-time” JIT compiler will be included. While the core R packages have been byte-compiled since 2011, and package authors also have the option of btye-compiling the R code they contain, it was tricky for ordinary R users to gain the benefits of byte-compilation for their own code. In 3.4.0, loops in your R scripts and functions you write will be byte-compiled as you use them (“just-in-time”), so you can get improved performance for your R code without taking any additional actions.

Stay tuned for the release.

Comments closed

Twitter Sentiment Analysis Using doc2vec

Published 2017-02-16 by Kevin Feasel

Sergey Bryl uses word2vec and doc2vec to perform Twitter sentiment analysis in R:

But doc2vec is a deep learning algorithm that draws context from phrases. It’s currently one of the best ways of sentiment classification for movie reviews. You can use the following method to analyze feedbacks, reviews, comments, and so on. And you can expect better results comparing to tweets analysis because they usually include lots of misspelling.

We’ll use tweets for this example because it’s pretty easy to get them via Twitter API. We only need to create an app on https://dev.twitter.com (My apps menu) and find an API Key, API secret, Access Token and Access Token Secret on Keys and Access Tokens menu tab.

Click through for more details, including code samples.

Comments closed

Galaxy Classification With SQL Server

Published 2017-02-16 by Kevin Feasel

David Smith points out a nice Microsoft demo for classifying galaxies using SQL Server:

The SQL Server Blog has since published a step-by-step tutorial on implementing the galaxy classifier in SQL Server (and the code is also available on GitHub). This updated version of the demo uses the new MicrosoftML package in Microsoft R Server 9, and specifically the rxNeuralNet function for deep neural networks. The tutorial recommends using the Azure NC class of virtual machines, to take advantage of the GPU-accelerated capabilities of the function, and provides details on using the SQL Server interfaces to train the neural netowrk and run predictions (classifications) on the image database. For the details, follow the link below.

If you’re going to get into SQL Server R Services at any level of seriousness, I highly recommend R Tools for Visual Studio, as it will make building those external stored procedure calls much easier.

Comments closed

RStudio Connect

Published 2017-02-13 by Kevin Feasel

Jen Underwood discusses RStudio Connect:

RStudio officially introduced the newest product in RStudio’s product lineup: RStudio Connect. RStudio Connect is a new publishing platform for R that allows analytics users to share Shiny applications, R Markdown reports, dashboards, plots, and more. This release adds an improved user experience for parameterized R Markdown reports, simple button-click publishing from the RStudio IDE, scheduled execution and distribution of reports, and more security policies include hybrid data connections. Essentially RStudio Connect eases enterprise deployment scenarios.

Between what Microsoft is doing with its old Revolution Analytics holdings and what RStudio is doing, this is a great time to be an enterprise R customer.

Comments closed

Scaling Up R

Published 2017-02-13 by Kevin Feasel

Ginger Grant explains how to use SQL Server R Services to take advantage of server resources instead of running from your local machine:

Microsoft’s R Server contains some specialized functions which are not part of the standard CRAN R installation. One of the ScaleR functions, RxInSqlServer will allow code to be processed on the server from the client. To make this work, you must have R Server and R Client installed. If you are doing a test on a local machine, you will need both R Client and R Server installed on that computer.

Click through for a script which walks you through the process.

Comments closed

Dplyr Tutorial

Published 2017-02-09 by Kevin Feasel

Deepanshu Bhalla has a nice dplyr tutorial:

What is dplyr?

dplyr is a powerful R-package to manipulate, clean and summarize unstructured data. In short, it makes data exploration and data manipulation easy and fast in R.

What’s special about dplyr?

The package “dplyr” comprises many functions that perform mostly used data manipulation operations such as applying filter, selecting specific columns, sorting data, adding or deleting columns and aggregating data. Another most important advantage of this package is that it’s very easy to learn and use dplyr functions. Also easy to recall these functions. For example, filter() is used to filter rows.

dplyr is a core package when it comes to data cleansing in R, and the more of it you can internalize, the faster you’ll move in that language.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: R

The pipe operator