Running H2O In R On Azure HDInsight

Daisy Deng shows how to configure HDInsight to be able to run the H2O package in R rather than Python or Scala:

We provide a few script actions for installing rsparkling on Azure HDInsight. When creating the HDInsight cluster, you can run the following script action for header node:

And run the following action for the worker node:

Please consult Customize Linux-based HDInsight clusters using Script Action for more details.

Click through for the full process.

Basics Of Neural Nets

Leila Etaati has a new series on neural nets in R:

in Neural Network, we have some hidden Nodes that do the main job ! they found the best value for the output, they are using some function that we call that functions as “Activation function” for instance in below picture, Node C is a hidden node that take the values from node A and B. as you can see the weight (the better path) related to Node B as shown in tick line that means Node B may lead to get better results so Node C get input values from Node B not Node A.

If you have time, also check out the linked YouTube videos.

Envisioning Neural Nets As Org Charts

Maiia Bakhova describes the layout of a neural net as similar to a chain of command within an organization:

We can observe a lot of in common with a corporation chain of command. As we see middle managers are hidden layers which do the balk of the job.  We have the similar information flow and processing which is analogous to forward propagation and backward propagation.
What is left now is to explain that  dealing with sigmoid function at each node is too costly so it mostly reserved for CEO level.

That’s a metaphor I hadn’t heard before.

Spark And H2O

Avkash Chauhan shows how to use sparklyr and rsparkling to tie Spark together with the H2O library in R:

In order to work with Spark H2O using rsparkling and sparklyr in R, you must first ensure that you have both sparklyr and rsparkling installed.

Once you’ve done that, you can check out the working script, the code for testing the Spark context, and the code for launching H2O Flow. All of this information can be found below.

It’s a short post, but it does show how to kick off a job.

Microsoft ML For Park

Xiaoyong Zhu announces that the Microsoft Machine Learning library is now available for Spark:

We’ve learned a lot by working with customers using SparkML, both internal and external to Microsoft. Customers have found Spark to be a powerful platform for building scalable ML models. However, they struggle with low-level APIs, for example to index strings, assemble feature vectors and coerce data into a layout expected by machine learning algorithms. Microsoft Machine Learning for Apache Spark (MMLSpark) simplifies many of these common tasks for building models in PySpark, making you more productive and letting you focus on the data science.

The library provides simplified consistent APIs for handling different types of data such as text or categoricals. Consider, for example, a DataFrame that contains strings and numeric values from the Adult Census Income dataset, where “income” is the prediction target.

It’s an open source project as well, so that barrier to entry is lowered significantly.

Riddler Nation: Game Theory In Action

Curtis Miller goes over a multi-phase distribution game with no known information:

The winning strategy of the last round, submitted by Vince Vatter, was (0, 1, 2, 16, 21, 3, 2, 1, 32, 22), with an official record1 of 751 wins, 175 losses, and 5 ties. Naturally, the top-performing strategies look similar. This should not be surprising; winning strategies exploit common vulnerabilities among submissions.

I’ve downloaded the submitted strategies for the second round (I already have the first round’s strategies). Lets load them in and start analyzing them.

This is a great blog post, which looks at using evolutionary algorithms to evolve a winning strategy.

Sentiment Analysis In R

Stefan Feuerriegel and Nicolas Pröllochs have a new package in CRAN:

Our package “SentimentAnalysis” performs a sentiment analysis of textual contents in R. This implementation utilizes various existing dictionaries, such as QDAP or Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable.

I’m not sure how it stacks up to external services, but it’s another option available to us.

Using OtterTune To Tune Databases

Dana Van Aken, Geoff Gordon, and Any Pavlo show off OtterTune, which uses machine learning techniques to tune database management systems like MySQL and Postgres:

OtterTune, a new tool that’s being developed by students and researchers in the Carnegie Mellon Database Group, can automatically find good settings for a DBMS’s configuration knobs. The goal is to make it easier for anyone to deploy a DBMS, even those without any expertise in database administration.

OtterTune differs from other DBMS configuration tools because it leverages knowledge gained from tuning previous DBMS deployments to tune new ones. This significantly reduces the amount of time and resources needed to tune a new DBMS deployment. To do this, OtterTune maintains a repository of tuning data collected from previous tuning sessions. It uses this data to build machine learning (ML) models that capture how the DBMS responds to different configurations. OtterTune uses these models to guide experimentation for new applications, recommending settings that improve a target objective (for example, reducing latency or improving throughput).

In this post, we discuss each of the components in OtterTune’s ML pipeline, and show how they interact with each other to tune a DBMS’s configuration. Then, we evaluate OtterTune’s tuning efficacy on MySQL and Postgres by comparing the performance of its best configuration with configurations selected by database administrators (DBAs) and other automatic tuning tools.

This is potentially a very interesting technology and is not the only one of its kind—we’ve seen Microsoft enter this space as well for SQL Server index and tuning recommendations.

Genetic Algorithms

Melanie Mitchell provides an introduction to how genetic algorithms work:

Many computational problems require a computer program to be adaptive—to continue to perform well in a changing environment. This is typified by problems in robot control in which a robot has to perform a task in a variable environment, or computer interfaces that need to adapt to the idiosyncrasies of an individual user. Other problems require computers to be innovative—to construct something truly new and original, such as a new algorithm for accomplishing a computational task, or even a new scientific discovery. Finally, many computational problems require complex solutions that are difficult to program by hand. A striking example is the problem of creating artificial intelligence. Early on, AI practitioners believed that it would be straightforward to encode the rules that would confer intelligence in a program; expert systems are a good example. Nowadays, many AI researchers believe that the “rules” underlying intelligence are too complex for scientists to encode in a “top-down” fashion, and that the best route to artificial intelligence is through a “bottom-up” paradigm. In such a paradigm, human programmers encode simple rules, and complex behaviors such as intelligence emerge from these simple rules. Connectionism (i.e., the study of computer programs inspired by neural systems) is one example of this philosophy (Smolensky, 1988); evolutionary computation is another.

For fun and completely inappropriate implementations of genetic algorithms in T-SQL, William Talada and Gail Shaw have us covered.

Machine Learning At Build 2017

Adnan Masood looks at some of the new machine learning offerings in Azure:

Language Understanding Intelligent Service (LUIS) is one of the marquee offerings in cognitive services which contains an entire suite of NLU / NLP capabilities, teaching applications to understand entities, utterances, and genera; commands from user input. Other language services include Bing Spell Check API which detect and correct spelling mistakes, Web Language Model API which helps building knowledge graphs using predictive language models Text Analytics API to perform topic modeling and do sentiment analysis, as well as Translator Text API to perform automatic text translation. The Linguistic Analysis API is a new addition which parses and provide context around language concepts.

In the knowledge spectrum, the Recommendations API to help predict and recommend items, Knowledge Exploration Service to enable interactive search experiences over structured data via natural language inputs, Entity Linking Intelligence Service for NER / disambiguation, Academic Knowledge API (academic content in the Microsoft Academic Graph search), QnA Maker API, and the newly minted custom Decision Service which provides a contextual decision-making API with reinforcement learning features. Search APIs include Autosuggest, news, web, image, video and customized searches.

There are some nice products available on the Azure platform and Adnan does a good job of outlining them.


July 2019
« Jun