Neural Nets With R And Power BI

Leila Etaati continues her series on using neural nets in Power BI:

we are going to predict the concrete strength using neural network. neural network can be used for predict a value or class, or it can be used for predicting multiple items. In this example, we are going to predict a value, that is concrete strength.

I have loaded the data in power bi first, and in “Query Editor” I am going to write some R codes. First we need to do some data transformations. As you can see in the below picture number 2,3 and 4,data is not in a same scale, we need to do some data normalization before applying any machine learning. I am going to write a code for that (Already explained the normalization in post KNN). So to write some R codes, I just click on the R transformation component (number 5).

There’s a lot going on in this demo; check it out.

dbplyr

Kevin Feasel

2017-06-29

R

Hadley Wickham announces dbplyr version 1.1.0:

Since you’ve read this far, I also wanted to touch on RStudio’s vision for databases. Many analysts have most of their data in databases, and making it as easy as possible to get data out of the database and into R makes a huge difference. Thanks to the community, R already has strong tools for talking to the popular open source databases. But support for connecting to enterprise databases and solving enterprise challenges has lagged somewhat. At RStudio we are actively working to solve these problems.

As well as dbplyr and DBI, we are working on many other pain points in the database ecosystem. You’ll hear much more about these packages in the future, but I wanted to touch on the highlights so you can see where we are heading. These pieces are not yet as integrated as they should be, but they are valuable by themselves, and we will continue to work to make a seamless database experience, that is as good as (or better than!) any other environment.

There’s some very interesting vision talk at the end, showing how Wickham and the RStudio group are dedicated to enterprise-grade R.

Running H2O In R On Azure HDInsight

Daisy Deng shows how to configure HDInsight to be able to run the H2O package in R rather than Python or Scala:

We provide a few script actions for installing rsparkling on Azure HDInsight. When creating the HDInsight cluster, you can run the following script action for header node:

https://bostoncaqs.blob.core.windows.net/scriptaction/scriptaction-head.sh

And run the following action for the worker node:

https://bostoncaqs.blob.core.windows.net/scriptaction/scriptaction-worker.sh

Please consult Customize Linux-based HDInsight clusters using Script Action for more details.

Click through for the full process.

Basics Of Neural Nets

Leila Etaati has a new series on neural nets in R:

in Neural Network, we have some hidden Nodes that do the main job ! they found the best value for the output, they are using some function that we call that functions as “Activation function” for instance in below picture, Node C is a hidden node that take the values from node A and B. as you can see the weight (the better path) related to Node B as shown in tick line that means Node B may lead to get better results so Node C get input values from Node B not Node A.

If you have time, also check out the linked YouTube videos.

CSV Import Speeds With H2O

Kevin Feasel

2017-06-26

R

WenSui Liu benchmarks three CSV loading methods in R:

The importFile() function in H2O is extremely efficient due to the parallel reading. The benchmark comparison below shows that it is comparable to the read.df() in SparkR and significantly faster than the generic read.csv().

I’d wonder if there are cases where this would vary significantly; regardless, for reading a large data file, parallel processing does tend to be faster.

Spark And H2O

Avkash Chauhan shows how to use sparklyr and rsparkling to tie Spark together with the H2O library in R:

In order to work with Spark H2O using rsparkling and sparklyr in R, you must first ensure that you have both sparklyr and rsparkling installed.

Once you’ve done that, you can check out the working script, the code for testing the Spark context, and the code for launching H2O Flow. All of this information can be found below.

It’s a short post, but it does show how to kick off a job.

Power BI Supports Interactive R Visuals

Kevin Feasel

2017-06-23

Power BI, R

David Smith reports on a great update to Power BI:

The above chart was created with the plotly package, but you can also use htmlwidgets or any other R package that creates interactive graphics. The only restriction is that the output must be HTML, which can then be embedded into the Power BI dashboard or report. You can also publish reports including these interactive charts to the online Power BI service to share with others. (In this case though, you’re restricted to those R packages supported in Power BI online.)

Power BI now provides four custom interactive R charts, available as add-ins:

I’d avoided doing too much with R visuals in Power BI because the output was so discordant—Power BI dashboards are often lively things, but the R visual would just sit there, limp and lifeless.  I’m glad to see that this has changed.

Bayesian Average

Jelte Hoekstra has a fun post applying the Bayesian average to board game ratings:

Maybe you want to explore the best boardgames but instead you find the top 100 filled with 10/10 scores. Experience many such false positives and you will lose faith in the rating system. Let’s be clear this isn’t exactly incidental either: most games have relatively few votes and suffer from this phenomenon.

The Bayesian average

Fortunately, there are ways to deal with this. BoardGameGeek’s solution is to replace the average by the Bayesian average. In Bayesian statistics we start out with a prior that represents our a priori assumptions. When evidence comes in we can update this prior, computing a so called posterior that reflects our updated belief.

Applied to boardgames this means: if we have an unrated game we might as well assume it’s average. If not, the ratings will have to convince us otherwise. This certainly removes outliers as we will see below!

This is a rather interesting article and you can easily apply it to other rating systems as well.

Static Site Generation With Hugo

Steph Locke explains how to build a simple site using Hugo:

This site uses Hugo. Hugo is a “static site generator” which means you write a bunch of markdown and it generates html. This is great for building simple sites like company leafletware or blogs.

You can get Hugo across platforms and on Windows it’s just an executable you can put in your program files. You can then work with it like git in the command line.

Read on for a step-by-step process to get started.  Steph also links to blogdown, which is an interesting R-friendly extension.

Forcing 0 Intercept Inflates R-squared In R

John Mount has an informative post on how you can trick yourself when running linear regression models in R and forcing the y intercept to be 0:

So far so good. Let’s now remove the “intercept term” by adding the “0+” from the fitting command.

m2 <- lm(y~0+x, data=d)t(broom::glance(m2))
## [,1]
## r.squared 7.524811e-01
## adj.r.squared 7.474297e-01
## sigma 3.028515e-01
## statistic 1.489647e+02
## p.value 1.935559e-30
## df 2.000000e+00
## logLik -2.143244e+01
## AIC 4.886488e+01
## BIC 5.668039e+01
## deviance 8.988464e+00
## df.residual 9.800000e+01
d$pred2 <- predict(m2, newdata = d)

Uh oh. That appeared to vastly improve the reported R-squared and the significance (“p.value“)!

Read on to learn why this happens and how you can prevent this from tricking you in the future.

Categories

August 2017
MTWTFSS
« Jul  
 123456
78910111213
14151617181920
21222324252627
28293031