Press "Enter" to skip to content

Category: Machine Learning

ML Services and Resource Governor

I have a post on two gotchas you might run into around Resource Governor throttling SQL Server Machine Learning Services:

By default, SQL Server will grant 20% of available memory to any R or Python scripts running. The purpose of this limit is to prevent you from hurting server performance with expensive external scripts (like, say, training large neural networks on a SQL Server).

Here’s the kicker: this affects you even if you don’t have Resource Governor enabled. If you see out-of-memory exceptions in Python or error messages about memory allocation in R, I’d recommend bumping this max memory percent up above 20, and I have scripts to help you with the job. Of course, making this change assumes that your server isn’t stressed to the breaking point; if it is, you might simply want to offload that work somewhere else.

Click through for the other issue.

Leave a Comment

Stock Price Predictions with LSTM Models

Thenuja Shanthacumaran walks us through training a Long Short-Term Memory neural network model for predicting stock prices:

LSTM could not process a single data point. it needs a sequence of data for processing and able to store historical information. LSTM is an appropriate algorithm to make prediction and process based-on time-series data. It’s better to work on the regression problem.

The stock market has enormously historical data that varies with trade date, which is time-series data, but the LSTM model predicts future price of stock within a short-time period with higher accuracy when the dataset has a huge amount of data.

Click through for the process and a demo.

Leave a Comment

Installing TensorFlow and Keras for R on SQL Server 2019 ML Services

I have a post on using TensorFlow and Keras in R on SQL Server 2019 Machine Learning Services:

What I’m doing is building a new virtual environment named r-reticulate, which is what the reticulate package in R desires. Inside that virtual environment, I’m installing the latest versions of tensorflow-probabilitytensorflow , and keras. I had DLL loading problems with TensorFlow 2.1 on Windows, so if you run into those, the proper solution is to ensure that you have the appropriate Visual C++ redistributables installed on your server.

Then, I switched back to the base virtual environment and installed the same packages. My thinking here is that I’ll probably need them for other stuff as well (and don’t tell anybody, but I’m not very good with Python environments).

Please continue not to tell anybody that I’m not very good with Python environments. I tend to dump things in the base environment, forget which one I’m in, and all kinds of other bad practices. I think I’m secretly undermining myself in Python, but I don’t have enough proof yet.

Comments closed

Avoiding Overfitting and Underfitting in Neural Networks

Manas Narkar provides some advice on optimizing neural network models:

Adding Dropout

Dropout is considered as one of the most effective regularization methods. Dropout is basically randomly zero-ing or dropping out features from your layer during the training process, or introducing some noise in the samples. The key thing to note is that this is only applied at training time. At test time, no values are dropped out. Instead, they are scaled. The typical dropout rate is between 0.2 to 0.5.

Click through for a demo on dropout, as well as coverage of several other techniques.

Comments closed

Genetic Algorithms in Python

Abhinav Choudhary walks us through building a genetic algorithm library in Python:

Here are quick steps for how the genetic algorithm works:

1. Initial Population– Initialize the population randomly based on the data.
2. Fitness function– Find the fitness value of the each of the chromosomes(a chromosome is a set of parameters which define a proposed solution to the problem that the genetic algorithm is trying to solve)
3. Selection– Select the best fitted chromosomes as parents to pass the genes for the next generation and create a new population
4. Cross-over– Create new set of chromosome by combining the parents and add them to new population set
5. Mutation– Perfrom mutation which alters one or more gene values in a chromosome in the new population set generated. Mutation helps in getting more diverse oppourtinity.Obtained population will be used in the next generation

I’m a sucker for genetic algorithms (and even more so its cousin, genetic programming). And there are still good use cases for genetic algorithms, especially in creating scoring functions for neural networks.

Comments closed

Comparing Recurrent and Convolutional Neural Networks

Sameer Nigam explains the differences between convolutional and recurrent neural networks:

CNN and RNN are amongst most important algorithm of Neural Network family, also they differ in their network process and solving problems.

So talking about their differences:

CNN are used to solve classification and regression problems and RNN are used to solve sequence information.

Read on to see what Sameer means by this.

Comments closed

Neural Networks and a Reproducibility Problem

William Vorhies looks at a recent paper on attempts at reproducing results from various types of neural networks:

Your trusted lead on recommenders rushes up with a new paper in hand.  Just back from the RecSys conference where the paper was presented he shows you the results.  It appears your top-N recommender could be made several percentage points better using the new technique.  The downside is that it would require you to adopt one of the new DNN collaborative filtering models which would be much more compute intensive and mean a deep reskilling dive for some of your team.

Would you be surprised to find out that the results in that paper are not reproducible?  Or more, that the baseline techniques to which it was compared to show improvements were not properly optimized.  And, if they had been, the much simpler techniques would be shown to be superior.

In a recent paper “Are We Really Making Much Progress”, researchers Maurizio Ferrari Dacrema, Paolo Cremonesi, Dietmar Jannach raise a major red flag.  Houston, we have a reproducibility problem.

Having worked through some of these papers for a different algorithm, I’m not that surprised. Sometimes it seems like improvements are limited solely to the data set and scenario the authors came up with, though that may just be the cynic in me.

This article is a good reason for looking at several types of models during the research phase, and even trying to keep several models up to date. It’s also a reminder that if you’re looking at papers and hot algorithms, make sure they include a way to get the data used in testing (and source code, if you can).

Comments closed

Distributed Model Training with Cloudera ML

Zuling Kang and Anand Patil show us how to train models across several nodes using Cloudera Machine Learning:

Deep learning models are generally trained using the stochastic gradient descendent (SGD) algorithm. For each iteration of SGD, we will sample a mini-batch from the training set, feed it into the training model, calculate the gradient of the loss function of the observed values and the real values, and update the model parameters (or weights). As it is well known that the SGD iterations have to be executed sequentially, it is not possible to speed up the training process by parallelizing iterations. However, as processing one single iteration for a number of commonly used models like CIFAR10 or IMAGENET takes a long time, even using the most sophisticated GPU, we can still try to parallelize the feedforward computation as well as the gradient calculation within each iteration to speed up the model training process.

In practice, we will split the mini-batch of the training data into several parts, like 4, 8, 16, etc. (in this article, we will use the term sub-batch to refer to these split parts), and each training worker takes one sub-batch. Then the training workers do feedforward, gradient computation, and model updating using the sub-batches, respectively, just as in the monolithic training mode. After these steps, a process called model average is invoked, averaging the model parameters of all the workers participating in the training, so as to make the model parameters exactly the same when a new training iteration begins. Then the new round of the training iteration starts again from the data sampling and splitting step.

Read on for the high-level explanation, followed by some Python code working in TensorFlow.

Comments closed

Text Mining and Sentiment Analysis in R

Sanil Mhatre walks us through a sentiment analysis scenario in R:

Sentiments can be classified as positive, neutral or negative. They can also be represented on a numeric scale, to better express the degree of positive or negative strength of the sentiment contained in a body of text.

This example uses the Syuzhet package for generating sentiment scores, which has four sentiment dictionaries and offers a method for accessing the sentiment extraction tool developed in the NLP group at Stanford. The get_sentiment function accepts two arguments: a character vector (of sentences or words) and a method. The selected method determines which of the four available sentiment extraction methods will be used. The four methods are syuzhet (this is the default), bingafinn and nrc. Each method uses a different scale and hence returns slightly different results. Please note the outcome of nrc method is more than just a numeric score, requires additional interpretations and is out of scope for this article. The descriptions of the get_sentiment function has been sourced from : https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html?

Comments closed

Using Cognitive Services in Power BI without a Premium Subscription

Marc Lelijveld and Kathrin Borchert show how we can take advantage of Cognitive Services and Power BI without having to pay for Power BI Premium:

Recently, I was presenting my session about AI Capabilities for Power BI to make AI Accessible for Everyone for the Virtual Power BI Days Hamburg. A great event organized by Kathrin Borchert. Part of my session was about the Artificial Intelligence capabilities offered as part of Power BI Premium. A day later, Kathrin came up with a great idea how you can leverage these AI capabilities without the need for Power BI Premium.

I was directly enthusiastic about that idea since I thought about this in the past as well. Back then, there were some blockers which are sorted now. I asked Kathrin if she was open for co-authoring this blog and she immediately agreed.

Click through for the technique. Basically, it’s a trade-off between simplicity and cost.

Comments closed