Press "Enter" to skip to content

Category: Machine Learning

Machine Learning At Build 2017

Adnan Masood looks at some of the new machine learning offerings in Azure:

Language Understanding Intelligent Service (LUIS) is one of the marquee offerings in cognitive services which contains an entire suite of NLU / NLP capabilities, teaching applications to understand entities, utterances, and genera; commands from user input. Other language services include Bing Spell Check API which detect and correct spelling mistakes, Web Language Model API which helps building knowledge graphs using predictive language models Text Analytics API to perform topic modeling and do sentiment analysis, as well as Translator Text API to perform automatic text translation. The Linguistic Analysis API is a new addition which parses and provide context around language concepts.

In the knowledge spectrum, the Recommendations API to help predict and recommend items, Knowledge Exploration Service to enable interactive search experiences over structured data via natural language inputs, Entity Linking Intelligence Service for NER / disambiguation, Academic Knowledge API (academic content in the Microsoft Academic Graph search), QnA Maker API, and the newly minted custom Decision Service which provides a contextual decision-making API with reinforcement learning features. Search APIs include Autosuggest, news, web, image, video and customized searches.

There are some nice products available on the Azure platform and Adnan does a good job of outlining them.

Comments closed

ML Algorithm Cheat Sheet

Hui Li has a quick cheat sheet on which algorithms might be useful in a particular situation:

A typical question asked by a beginner, when facing a wide variety of machine learning algorithms, is “which algorithm should I use?” The answer to the question varies depending on many factors, including:

  • The size, quality, and nature of data.
  • The available computational time.
  • The urgency of the task.
  • What you want to do with the data.

Even an experienced data scientist cannot tell which algorithm will perform the best before trying different algorithms. We are not advocating a one and done approach, but we do hope to provide some guidance on which algorithms to try first depending on some clear factors.

Hui then goes into detail on each. h/t Vincent Granville

Comments closed

SQL Server ML Services

SQL Server R Services is now SQL Server Machine Learning Services and supports Python.  First, Nagesh Pabbisetty and Sumit Kumar talk about Python support:

The addition of Python builds on the foundation laid for R Services in SQL Server 2016 and extends that mechanism to include Python support for in-database analytics and machine learning. We are renaming R Services to Machine Learning Services, and R and Python are two options under this feature.

The Python integration in SQL Server provides several advantages:

  • Elimination of data movement: You no longer need to move data from the database to your Python application or model. Instead, you can build Python applications in the database. This eliminates barriers of security, compliance, governance, integrity, and a host of similar issues related to moving vast amounts of data around. This new capability brings Python to the data and runs code inside secure SQL Server using the proven extensibility mechanism built in SQL Server 2016.

  • Easy deployment: Once you have the Python model ready, deploying it in production is now as easy as embedding it in a T-SQL script, and then any SQL client application can take advantage of Python-based models and intelligence by a simple stored procedure call.

  • Enterprise-grade performance and scale: You can use SQL Server’s advanced capabilities like in-memory table and column store indexes with the high-performance scalable APIs in RevoScalePy package. RevoScalePy is modeled after RevoScaleR package in SQL Server R Services. Using these with the latest innovations in the open source Python world allows you to bring unparalleled selection, performance, and scale to your SQL Python applications.

  • Rich extensibility: You can install and run any of the latest open source Python packages in SQL Server to build deep learning and AI applications on huge amounts of data in SQL Server. Installing a Python package in SQL Server is as simple as installing a Python package on your local machine.

  • Wide availability at no additional costs: Python integration is available in all editions of SQL Server 2017, including the Express edition.

Nagesh Pabbisetty also announces Microsoft R Server 9.1:

We took the first step with Microsoft R Server 9.0, and this follow on release includes significant innovations such as:

  • New machine learning enhancements and inclusion of pre-trained cognitive models such as sentiment analysis & image featurizers

  • SQL Server Machine Learning Services with integrated Python in Preview

  • Enterprise grade operationalization with real-time scoring and dynamic scaling of VMs

  • Deep customer & ISV partnerships to deliver the right solutions to customers

  • A panoply of sources to help you get started with ease

And Joseph Sirosh indicates that AI is where the money is:

So today it’s my pleasure to announce the first RDBMS with built-in AIa production-quality Community Technology Preview (CTP 2.0) of SQL Server 2017. In this preview release, we are introducing in-database support for a rich library of machine learning functions, and now for the first time Python support (in addition to R). SQL Server can also leverage NVIDIA GPU-accelerated computing through the Python/R interface to power even the most intensive deep-learning jobs on images, text, and other unstructured data. Developers can implement NVIDIA GPU-accelerated analytics and very sophisticated AI directly in the database server as stored procedures and gain orders of magnitude higher throughput. In addition, developers can use all the rich features of the database management system for concurrency, high-availability, encryption, security, and compliance to build and deploy robust enterprise-grade AI applications.

There’s a lot to digest here.

Comments closed

Using h2o.ai On HDInsight

Xiaoyong Zhu shows how to set up h2o.ai on Azure HDInsight:

H2O Flow is an interactive web-based computational user interface where you can combine code execution, text, mathematics, plots and rich media into a single document, much like Jupyter Notebooks. With H2O Flow, you can capture, rerun, annotate, present, and share your workflow. H2O Flow allows you to use H2O interactively to import files, build models, and iteratively improve them. Based on your models, you can make predictions and add rich text to create vignettes of your work – all within Flow’s browser-based environment. In this blog, we will only focus on its visualization part.

H2O FLOW web service lives in the Spark driver and is routed through the HDInsight gateway, so it can only be accessed when the spark application/Notebook is running

You can click the available link in the Jupyter Notebook, or you can directly access this URL:

https://yourclustername-h2o.apps.azurehdinsight.net/flow/index.html

Setup is pretty easy.

Comments closed

Linear Support Vector Machines

Ananda Das explains how linear Support Vector Machines work in classifying spam messages:

Linear SVM assumes that the two classes are linearly separable that is a hyper-plane can separate out the two classes and the data points from the two classes do not get mixed up. Of course this is not an ideal assumption and how we will discuss it later how linear SVM works out the case of non-linear separability. But for a reader with some experience here I pose a question which is like this Linear SVM creates a discriminant function but so does LDA. Yet, both are different classifiers. Why ? (Hint: LDA is based on Bayes Theorem while Linear SVM is based on the concept of margin. In case of LDA, one has to make an assumption on the distribution of the data per class. For a newbie, please ignore the question. We will discuss this point in details in some other post.)

This is a pretty math-heavy post, so get your coffee first. h/t R-Bloggers.

Comments closed

Understanding Neural Nets

David Smith links to a video which explains how neural networks do their thing:

In R, you can train a simple neural network with just a single hidden layer with the nnet package, which comes pre-installed with every R distribution. It’s a great place to start if you’re new to neural networks, but the deep learning applications call for more complex neural networks. R has several packages to check out here, including MXNetdarchdeepnet, and h2o: see this post for a comparison. The tensorflow package can also be used to implement various kinds of neural networks.

R makes it pretty easy to run one, though it then becomes important to understand regularization as a part of model tuning.

Comments closed

TensorFlow With YARN

Wangda Tan and Vinod Kumar Vavilapalli show how to control TensorFlow jobs with YARN:

YARN has been used successfully to run all sorts of data applications. These applications can all coexist on a shared infrastructure managed through YARN’s centralized scheduling.

With TensorFlow, one can get started with deep learning without much knowledge about advanced math models and optimization algorithms.

If you have GPU-equipped hardware, and you want to run TensorFlow, going through the process of setting up hardware, installing the bits, and optionally also dealing with faults, scaling the app up and down etc. becomes cumbersome really fast. Instead, integrating TensorFlow to YARN allows us to seamlessly manage resources across machine learning / deep learning workloads and other YARN workloads like MapReduce, Spark, Hive, etc.

Read on for more details, including a demo video.

Comments closed

Using Prophet For Stock Price Predictions

Marcelo Perlin looks at Facebook’s Prophet to see if it works well for predicting stock price movements:

The previous histogram shows the total return from randomly generated signals in 10^{4} simulations. The vertical line is the result from using prophet. As you can see, it is a bit higher than the average of the distribution. The total return from prophet is lower than the return of the naive strategy in 27.5 percent of the simulations. This is not a bad result. But, notice that we didn’t add trading or liquidity costs to the analysis, which will make the total returns worse.

The main results of this simple study are clear: prophet is bad at point forecasts for returns but does quite better in directional predictions. It might be interesting to test it further, with more data, adding trading costs, other forecasting setups, and see if the results hold.

This is a very interesting article, worth reading.  H/T R Bloggers

Comments closed

Handwriting Character Recognition

Tomaz Kastrun compares a few different libraries in terms of handwritten numeric character recognition:

Recently, I did a session at local user group in Ljubljana, Slovenija, where I introduced the new algorithms that are available with MicrosoftML package for Microsoft R Server 9.0.3.

For dataset, I have used two from (still currently) running sessions from Kaggle. In the last part, I did image detection and prediction of MNIST dataset and compared the performance and accuracy between.

MNIST Handwritten digit database is available here.

Tomaz has all of the code available as well.

Comments closed

Twitter Sentiment Analysis Using doc2vec

Sergey Bryl uses word2vec and doc2vec to perform Twitter sentiment analysis in R:

But doc2vec is a deep learning algorithm that draws context from phrases. It’s currently one of the best ways of sentiment classification for movie reviews. You can use the following method to analyze feedbacks, reviews, comments, and so on. And you can expect better results comparing to tweets analysis because they usually include lots of misspelling.

We’ll use tweets for this example because it’s pretty easy to get them via Twitter API. We only need to create an app on https://dev.twitter.com (My apps menu) and find an API Key, API secret, Access Token and Access Token Secret on Keys and Access Tokens menu tab.

Click through for more details, including code samples.

Comments closed