R – Page 143 – Curated SQL

Budapest satRday

Published 2016-08-30 by Kevin Feasel

The first satRdays event will take place in Budapest on September 3rd:

This is a very exciting project with great interest from the R and more general data science community — in the past short 2 months (since we opened registration for the conference):

More than 160 persons signed up and paid for attendance from 17 countries so far (around 50-50% mix of academic and industry tickets, 30-70% mix of foreign and Hungarian attendees)
We received almost 40 voluntary talk proposals in a few weeks of time while the CfP was open
25 selected & awesome speakers agreed on to present at the conference

I’d like to see this take off, similar to SQL Saturdays.

Comments closed

Logging Python-Style

Published 2016-08-26 by Kevin Feasel

Jonathan Callahan wants to generate nice-looking logs in R:

Our real world scenario involves R scripts that process raw smoke monitoring data that is updated hourly. The raw data comes from various different instruments, set up by different agencies and transmitted over at least two satellites before eventually arriving on our computers.

Data can be missing, delayed or corrupted for a variety of reasons before it gets to us. And then our R scripts perform QC based on various columns available in the raw (aka “engineering level”) data.

Logging is one of the differences between toy code (even very useful toy code) and production-quality code. Read on for an easy way to do this in R.

Comments closed

R And Power BI

Published 2016-08-26 by Kevin Feasel

David Smith points out some of the work the Power BI team has done to integrate R into their product:

Power BI, Microsoft’s data visualization and reporting platform, has made great strides in the past year integrating the R language. This Computerworld article describes the recent advances with Power BI and R. In short, you can:

import data into Power BI by using an R script
cleanse and transform other data sources coming into Power BI using R functions
create custom charts in a Power BI dashboard using the R language, like these maps

Click through for more things you can do, as well as additional links and resources.

Comments closed

Data Frames

Published 2016-08-26 by Kevin Feasel

Meltem Ballan introduces data frames in R:

Our first data frame constrained of seven vectors, Customer_Id, loan_type, First_Name, Last_name, Gender, Zip_code and amount.

NOTE: R is case sensitive. That is why I have used lower and upper case for you to practice.

After we run the lines we want to see how our first data frame looks. Following command will suffice that need:

>View(my_first_df)

If you’re coming from a SQL background, data frames are tables. Well-formed (“clean”) data frames more or less follow first normal form.

Comments closed

Residuals

Published 2016-08-24 by Kevin Feasel

Simon Jackson discusses the concept of residuals:

The general approach behind each of the examples that we’ll cover below is to:

Fit a regression model to predict variable (Y).
Obtain the predicted and residual values associated with each observation on (Y).
Plot the actual and predicted values of (Y) so that they are distinguishable, but connected.
Use the residuals to make an aesthetic adjustment (e.g. red colour when residual in very high) to highlight points which are poorly predicted by the model.

The post is about 10% understanding what residuals are and 90% showing how to visualize them and spot major discrepancies.

Comments closed

Introduction To R

Published 2016-08-24 by Kevin Feasel

Allison Tharp takes a look at R:

RStudio has several ways to import data. One of the easiest ways is to import via URL. This link (https://data.montgomerycountymd.gov/api/views/6rqk-pdub/rows.csv?accessType=DOWNLOAD) gives us the salaries of all of the government employees for Montgomery County, MD in a CSV format. To import this into RStudio, copy the URL and go to Tools -> Import Dataset -> From Web URL…

R and Python are both good languages to learn for data analysis. I lean just a little bit toward R, but they’re both strong choices in this space.

Comments closed

Not Catching Them All

Published 2016-08-23 by Kevin Feasel

Hanjo Odendaal explains clustering techniques using Pokemon:

To collect the data on all the first generation pokemon, I employ Hadley Wickam’s rvest package. I find it very intuitive and can handle all of my needs in collecting and extracting the data from a pokemon wiki. I will grab all the Pokemon up until to Gen II, which constitutes 251 individuals. I did find the website structure a bit of a pain as each pokemon had very different looking web pages. But, with some manual hacking, I eventually got the data in a nice format.

This probably means a lot more to you if you grew up in front of a Game Boy, but there’s some good technique in here regardless.

Comments closed

Visualizations In Five Lines Of Code

Published 2016-08-23 by Kevin Feasel

David Smith highlights a Sharon Machlis article showing visualizations in up to five lines of R code:

I’ve reproduced Sharon’s code and charts below. I did make a couple of tweaks to the code, though. I added a call to checkpoint(“2016-08-22”) which, if you’ve saved the code to a file, will install all the necessary packages for you. (I also verified that the code runs with package versions as of today’s date, and if you’re trying out this code at a later time it will continue to do so, thanks to checkpoint.) I also modified the data download code to make it work more easily on Windows. Here are the charts and code

It’s really easy to get basic visualizations within R, and these are better than basic visualizations.

Comments closed

Resource Governor For R

Published 2016-08-19 by Kevin Feasel

Tomaz Kastrun describes using Resource Governor with Microsoft R Services:

Setting external resource pool for execution of R commands using sp_execute_external_script has proven extremely useful, especially in cases where you have other workers present, when you don’t want to overdo on data analysis and get to much resources from others (especially when running data analysis in production environment) or when you know that your data analysis will require maximum CPU and memory for a specific period. In such cases using and definingexternal resource pool is not only useful but highly recommended.

Resource governor is a feature that enables you to manage SQL Server workload and system resource consumption their limits. Limits can be configures for any workload in terms of CPU, Memory and I/O consumption. Where you have many different workloads on the same SQL Server, resource Governor helps allocate requested resources.

If you’re concerned about R soaking up all of your server’s memory, Resource Governor is a great way of limiting that risk.

Comments closed

R Or M?

Published 2016-08-17 by Kevin Feasel

Ryan Wade gives a few scenarios in which R might be a better language choice than M for Power BI integration:

When referring to what can be done in iOS, Apple often say that there is an “app” for that. Likewise, when R developers refer to what can be done in R, we often say that there is a “package” for that. For instance:

· If one needs to scrap data from the web there are packages for that (rvest, rcurl, and others)

· If one needs to make complicated transformations to their data there are packages for that (dplyr, tidyr, lubrdiate, stringr, and others)

I like the F#-ness of M, but I admit that I’m happy there’s some fairly close R integration within Power BI, as that means there’s one fewer language I need to learn right now…

Comments closed

Category: R