Press "Enter" to skip to content

Category: Machine Learning

Microsoft ML Server 9.3 Released

Nagesh Pabbisetty announces Microsoft Machine Learning Server 9.3:

In ML Server 9.3, we have added support for SQL compute context in ML Server and in R Client running on Linux platforms, so data scientists who work on Linux workstations can directly use in-database analytics with SQL Server compute context. Additionally, the SQLRUtils package can now be used to package the R scripts into T-SQL stored procedures and run them from R environment on Linux clients.

An interesting scenario enabled by the addition of SQL Server Compute context in ML Server running on Linux is that organizations can now provide a browser-based interface for accessing SQL Server compute context with R Studio Server and ML Server running on a Linux machine connecting to SQL Server.

Since introducing revoscalepy library in the last release of ML Server and SQL Server 2017, we have shipped several additions and improvements in the Python APIs as part of CU releases of SQL Server 2017. We have added APIs like rx_create_col_info, rx_get_var_info etc. that make it easier to get column information, esp. with large number of columns. We added rx_serialize_model for easy model serialization. We have also improved performance when working with string data in different scenarios.

This also gets you up to R 3.4.3. H/T David Smith

Comments closed

The Whys Of Azure ML Workbench

Ginger Grant explains why Azure Machine Learning Workbench exists:

Microsoft is looking for Azure Machine Learning Workbench for more than a tool to use for Machine Learning analysis. It is part of a system to manage and monitor the deployment of machine learning solutions with Azure Machine Learning Model Management. The management aspects are part of the application installation.  To install the Azure Machine Learning Workbench, the application download is available only by creating an account in Microsoft’s Azure environment, where a Machine Learning Model Management resource will be created as part of the install. Within this resource, you will be directed to create a virtual environment in Azure where you will be deploying and managing Machine Learning models.

This migration into management of machine learning components is part of a pattern first seen on the on-premises version of data science functionality.  First Microsoft helped companies manage the deployment of R code with SQL Server 2016 which includes the ability to move R code into SQL Server.  Providing this capability decreased the time it took to implement a data science solution by providing a means for the code can be deployed easily without the need for the R code to be re-written or included in another application. SQL Server 2017 expanded on this idea by allowing Python code to be deployed into SQL Server as well.  With the cloud service Model Management, Microsoft is hoping to centralize the implementation so that all Machine Learning services created can be managed in one place.

Read on for more.

Comments closed

Markov Chains In Python

Sandipan Dey shows off various uses of Markov chains as well as how to create one in Python:

Perspective. In the 1948 landmark paper A Mathematical Theory of Communication, Claude Shannon founded the field of information theory and revolutionized the telecommunications industry, laying the groundwork for today’s Information Age. In this paper, Shannon proposed using a Markov chain to create a statistical model of the sequences of letters in a piece of English text. Markov chains are now widely used in speech recognition, handwriting recognition, information retrieval, data compression, and spam filtering. They also have many scientific computing applications including the genemark algorithm for gene prediction, the Metropolis algorithm for measuring thermodynamical properties, and Google’s PageRank algorithm for Web search. For this assignment, we consider a whimsical variant: generating stylized pseudo-random text.

Markov chains are a venerable statistical technique and formed the basis of a lot of text processing (especially text generation) due to the algorithm’s relatively low computational requirements.

Comments closed

Anomaly Detection With Python

Robert Sheldon continues his SQL Server Machine Learning Series:

As important as these concepts are to working Python and MLS, the purpose in covering them was meant only to provide you with a foundation for doing what’s really important in MLS, that is, using Python (or the R language) to analyze data and present the results in a meaningful way. In this article, we start digging into the analytics side of Python by stepping through a script that identifies anomalies in a data set, which can occur as a result of fraud, demographic irregularities, network or system intrusion, or any number of other reasons.

The article uses a single example to demonstrate how to generate training and test data, create a support vector machine (SVM) data model based on the training data, score the test data using the SVM model, and create a scatter plot that shows the scoring results.

Click through to see the scenario that Robert has laid out as an example.

Comments closed

Non-English Natural Language Processing

The folks at BNOSAC have announced a new natural language processing toolkit for R:

BNOSAC is happy to announce the release of the udpipe R package (https://bnosac.github.io/udpipe/en) which is a Natural Language Processing toolkit that provides language-agnostic ‘tokenization’, ‘parts of speech tagging’, ‘lemmatization’, ‘morphological feature tagging’ and ‘dependency parsing’ of raw text. Next to text parsing, the package also allows you to train annotation models based on data of ‘treebanks’ in ‘CoNLL-U’ format as provided at http://universaldependencies.org/format.html.

The package provides direct access to language models trained on more than 50 languages.

Click through to check it out.

Comments closed

Time Series Forecasting With DeepAR

Tim Januschowski, et al, introduce DeepAR on AWS:

Today we are launching Amazon SageMaker DeepAR as the latest built-in algorithm for Amazon SageMaker. DeepAR is a supervised learning algorithm for time series forecasting that uses recurrent neural networks (RNN) to produce both point and probabilistic forecasts. We’re excited to give developers access to this scalable, highly accurate forecasting algorithm that drives mission-critical decisions within Amazon. Just as with other Amazon SageMaker built-in algorithms, the DeepAR algorithm can be used without the need to set up and maintain infrastructure for training and inference.

Click through for a product demonstration.

Comments closed

More On Machine Learning Services

Ginger Grant continues her Machine Learning Services series with a couple more posts.  First up is on memory allocation:

Enabling Machine Learning Services on SQL Server which I discussed in a previous blog post, requires you to enable external scripts.  Machine Learning Services are run as external processes to SQLPAL. This means that when you are running Python or R code you are running it outside of the managed processes of SQL Server and SQLPAL.  This design means that the resources used to run Machine Learning Services will run outside of the resources allocated for SQL Server.  If you are planning on using Machine Learning Services you will want to review the server memory options which you may have set for SQL Server.  If you have set the max server memory For example, if your server has 16 GB of RAM memory, and you have allocated  8 GB to SQL Server and you estimate that the operating system will use an additional 4 GB, that means that machine learning services will have 4 GB remaining which it can use.

By design, Machine Learning Services will not starve out all of the memory for SQL Server because it doesn’t use it.  This means DBAs to not have to worry about SQL Server processes not running because some R program is using all the memory as it does not use the memory SQL Server has allocated.  You do have to worry about the amount of memory allocated to Machine Learning Services as by default, using our previous example where there was 4 GB which Machine Learning Services can use, it will only use 20% of the available memory or  819 KB of memory.  That  is not a lot of memory.  Most likely if you are doing a lot of Machine Learning Services work you will want to use more memory which means you will want to change the default memory allocation for external services.

Ginger also talks about the Launchpad service:

When calling external processes, internally SQL Server uses User IDs to call the Launchpad service, which is installed as part of Machine Learning Services and must be running for SQL Server to be able to execute code written in R or Python.  The number of users is set by default.  To change the number of users, open  up SQL Server Configuration Manager by typing SQLServerManager14.msc at the run prompt. For some unknowable reason Microsoft decided to hide this application which was previously available by looking at the installed programs on the server.  Now for some reason they think everyone should memorize this obscure command. Once you have the SQL Server Configuration Manager open, right click on the SQL Server Launchpad service and select the properties which will show the window, as shown below.  You will notice I am running an instance called SQLServer2017 which is listed in parenthesis in the window name.

Both are worth reading.

Comments closed

Python Data Frames In ML Services

Robert Sheldon continues his SQL Server Machine Learning Services series by looking at Python data frames:

This article focuses on using data frames in Python. It is the second article in a series about MLS and Python. The first article introduced you briefly to data frames. This article continues that discussion, describing how to work with data frame objects and the data within those objects.

Data frames and the functions they support are available to MLS and Python through the pandas library. The library is available as a Python module that provides tools for analyzing and manipulating data, including the ability to generate data frame objects and work with data frame data. The pandas library is included by default in MLS, so the functions and data structures available to pandas are ready to use, without having to manually install pandas in the MLS library.

There’s quite a bit to this article, making it an interesting read.

Comments closed

Image Counts For Neural Network Training

Pete Warden shares his rule of thumb for how many images you need to train a neural network:

In the early days I would reply with the technically most correct, but also useless answer of “it depends”, but over the last couple of years I’ve realized that just having a very approximate rule of thumb is useful, so here it is for posterity:

You need 1,000 representative images for each class.

Like all models, this rule is wrong but sometimes useful. In the rest of this post I’ll cover where it came from, why it’s wrong, and what it’s still good for.

Read on to learn where the number 1000 came from and get some good hints, like flipping and rescaling images.

Comments closed

Capsule Neural Networks

Saurabh Kulshrestha covers the topic of capsule neural networks:

This is the problem with Convolutional Neural Networks as well. CNN is good at detecting features, but will wrongly activate the neuron for face detection. This is because it is less effective at exploring the spatial relationships among features.

A simple CNN model can extract the features for nose, eyes and mouth correctly but will wrongly activate the neuron for the face detection. Without realizing the mis-match in spatial orientation and size, the activation for the face detection will be too high.

Read on to see how capsule networks can help solve issues with convolutional neural networks.

Comments closed