A graph, a collection of nodes connected by edges, is just data. Whether it’s a social network (where nodes are people, and edges are friend relationships), or a decision tree (where nodes are branch criteria or values, and edges decisions), the nature of the graph is easily represented in a data object. It might be represented as a matrix (where rows and columns are nodes, and elements mark whether an edge between them is present) or as a data frame (where each row is an edge, with columns representing the pair of connected nodes).
The trick comes in how you represent a graph visually; there are many different options each with strengths and weaknesses when it comes to interpretation. A graph with many nodes and edges may become an unintelligible hairball without careful arrangement, and including directionality or other attributes of edges or nodes can reveal insights about the data that wouldn’t be apparent otherwise. There are many R packages for creating and displaying graphs (igraph is a popular one, and this CRAN task view lists many others) but that’s a problem in its own right: an important part of the data exploration process is trying and comparing different visualization options, and the myriad packages and interfaces makes that process difficult for graph data.
Click through for more information as well as a mesmerizing animated image.
There are many reasons why a DBA might want to not allow clients to access server memory as that will tax the server. Turning it off is relatively simple. Go to the SQL Server Management Console and select SQL Server Launchpad for the instance of SQL Server running R Server.
In the picture of the screen, the instance of SQL Server I have running R Services is in SS2016. Right click on the server and select Properties, then click on the Advanced tab. When looking at the number of external users allowed by default, the number might look familiar. The reason there are twenty User IDs created for R Server is because Launchpad allocates by default external twenty users to connect from SQL Server to run R. If you don’t want to allow external users to run on a server, you will need to prevent the users from connecting by not enabling them to run R. To run R, users need to have db_rrerole permissions. If they do not have that, they cannot run R. On the production server, it is probably best that this permission not be granted to non-system users.
Read on for more details.
For dataset, I have used two from (still currently) running sessions from Kaggle. In the last part, I did image detection and prediction of MNIST dataset and compared the performance and accuracy between.
MNIST Handwritten digit database is available here.
Tomaz has all of the code available as well.
In this post, we will show you a visualization and build a predictive model of US flights with sparklyr. Flight visualization code is based on this article.
This post assumes you already have the following tables:
- Airlines data as
airlines_bi_pq. It is assumed to be on S3, but you can put it into HDFS. See also the Ibis project.
- Airports data converted into Parquet format as
airports_new_pq. See also 2009 ASA Data Expo.
You should make these tables available through Apache Hive or Apache Impala (incubating) with Hue.
There’s some setup work to get this going, but getting a handle on sparklyr looks to be a good idea if you’re in the analytics space.
We’ll cover the features in detail with the general availability release of RTVS 1.0, but in summary the new features include:
Remote Execution: type R code in your local RTVS instance, but have the computations performed on a remote R server. You can also switch between local and remote workspaces at will.
SQL Server Integration: work with database connections and SQL queries, and create stored procedures with embedded R code.
Enhanced R Graphics Support: multiple floating and dockable plot windows, each with plot history.
I’ve been using RTVS more frequently lately and it’s definitely growing on me.
One of the nifty things about using R is that you can use it for many different purposes and even other languages!
If you want to use Python in your knitr docs or the newish RStudio R notebook functionality, you might encounter some fiddliness getting all the moving parts running on Windows. This is a quick knitr Python Windows setup checklist to make sure you don’t miss any important steps.
Between knitr, Zeppelin, and Jupyter, you should be able to find a cross-compatible notebook which works for you.
Maps are great for practicing data visualization. First of all, there’s a lot of data available on places like Wikipedia that you can map.
Moreover, creating maps typically requires several essential skills in combination. Specifically, you commonly need to be able to retrieve the data (e.g., scrape it), mold it into shape, perform a join, and visualize it. Because creating maps requires several skills from data manipulation and data visualization, creating them will be great practice for you.
And if that’s not enough, a good map just looks great. They’re visually compelling.
With that in mind, I want to walk you through the logic of building one step by step.
Read on for a step by step process.
The pipe operator
The pipe operator is one of the great features of the tidyverse. In base R, you often find yourself calling functions nested within functions nested within… you get the idea. The pipe operator
%>%takes the object on the left-hand side, and “pipes” it into the function on the right hand side.
Click through for the rest of the story.
Radiohead is known for having some fairly maudlin songs, but of all of their tracks, which is the most depressing? Data scientist and R enthusiast Charlie Thompson ranked all of their tracks according to a “gloom index”, and created the following chart of gloominess for each of the band’s nine studio albums. (Click for the interactive version, crated with with highcharter package for R, which allows you to explore individual tracks.)
Do click through for Charlie’s explanation, including where the numbers come from.
For instance, imagine we have below transaction items from a shopping store for last hours,
Customer 1: Salt, pepper, Blue cheese
Customer 2: Blue Cheese, Pasta, Pepper, tomato sauce
Customer 3: Salt, Blue Cheese, Pepperoni, Bacon, egg
Customer 4: water, Pepper, Egg, Salt
we want to know how many times customer purchase pepper and salt together
the support will be : from out four main transactions (4 customers), 2 of them purchased salt and pepper together. so the support will be 2 divided by 4 (all number of transaction.
Basket analysis is one way of building a recommendation engine: if you’re buying butter, cream, and eggs, do you also want to buy sugar?