Press "Enter" to skip to content

Category: Machine Learning

The %tensorboard Magic Command in Databricks Notebooks

Jerry Liang and Hossein Falaki introduce a new magic command in Databricks Runtime 7.2:

In 2017, we released the  dbutils.tensorboard.start() API to manage and use TensorBoard inside Databricks python notebooks. This API only permits one active TensorBoard process on a cluster at any given time – which hinders multi-tenant use-cases. Early last year, TensorBoard released its own API for notebooks via the %tensorboard python magic command. This API not only starts TensorBoard processes but also exposes the TensorBoard’s command line arguments in the notebook environment. In addition, it embeds the TensorBoard UI inside notebooks, whereas the dbutils.tensorboard.start API prints a link to open TensorBoard in a new tab.

Read on to see how you can use it.

Comments closed

Neural Network Model Deployment with ONNX

Terry McCann gives us a primer on ONNX:

Let me introduce you to ONNX. ONNX or the Open Neural Network eXchange is a runtime which can take a model, that you have trained in PyTorch or Tensorflow and encapsulate it in an ONNX format which is executed on something running the ONNX runtime. This new model can be trained in Python and deployed on an ML.net application, with no need for integration coding.

We have spent a huge amount of time creating different Docker containers for different types of models, the Tensorflow container or the PyTorch container or a container running in our model in Spark, the list goes on. That way of working is kind of becoming defunct. ONNX really breaks that down into a simple standard runtime that you can start working with and you can deploy your model into multiple different environments and ensure that is runs on your database, on your website, on your mobile device and also at the edge.

Terry has a video as well. I like the fact that ONNX exists and also that it’s available in Azure SQL Edge (in part because I want it available on-premises as well).

Comments closed

ML Services and Resource Governor

I have a post on two gotchas you might run into around Resource Governor throttling SQL Server Machine Learning Services:

By default, SQL Server will grant 20% of available memory to any R or Python scripts running. The purpose of this limit is to prevent you from hurting server performance with expensive external scripts (like, say, training large neural networks on a SQL Server).

Here’s the kicker: this affects you even if you don’t have Resource Governor enabled. If you see out-of-memory exceptions in Python or error messages about memory allocation in R, I’d recommend bumping this max memory percent up above 20, and I have scripts to help you with the job. Of course, making this change assumes that your server isn’t stressed to the breaking point; if it is, you might simply want to offload that work somewhere else.

Click through for the other issue.

Comments closed

Stock Price Predictions with LSTM Models

Thenuja Shanthacumaran walks us through training a Long Short-Term Memory neural network model for predicting stock prices:

LSTM could not process a single data point. it needs a sequence of data for processing and able to store historical information. LSTM is an appropriate algorithm to make prediction and process based-on time-series data. It’s better to work on the regression problem.

The stock market has enormously historical data that varies with trade date, which is time-series data, but the LSTM model predicts future price of stock within a short-time period with higher accuracy when the dataset has a huge amount of data.

Click through for the process and a demo.

Comments closed

Installing TensorFlow and Keras for R on SQL Server 2019 ML Services

I have a post on using TensorFlow and Keras in R on SQL Server 2019 Machine Learning Services:

What I’m doing is building a new virtual environment named r-reticulate, which is what the reticulate package in R desires. Inside that virtual environment, I’m installing the latest versions of tensorflow-probabilitytensorflow , and keras. I had DLL loading problems with TensorFlow 2.1 on Windows, so if you run into those, the proper solution is to ensure that you have the appropriate Visual C++ redistributables installed on your server.

Then, I switched back to the base virtual environment and installed the same packages. My thinking here is that I’ll probably need them for other stuff as well (and don’t tell anybody, but I’m not very good with Python environments).

Please continue not to tell anybody that I’m not very good with Python environments. I tend to dump things in the base environment, forget which one I’m in, and all kinds of other bad practices. I think I’m secretly undermining myself in Python, but I don’t have enough proof yet.

Comments closed

Avoiding Overfitting and Underfitting in Neural Networks

Manas Narkar provides some advice on optimizing neural network models:

Adding Dropout

Dropout is considered as one of the most effective regularization methods. Dropout is basically randomly zero-ing or dropping out features from your layer during the training process, or introducing some noise in the samples. The key thing to note is that this is only applied at training time. At test time, no values are dropped out. Instead, they are scaled. The typical dropout rate is between 0.2 to 0.5.

Click through for a demo on dropout, as well as coverage of several other techniques.

Comments closed

Genetic Algorithms in Python

Abhinav Choudhary walks us through building a genetic algorithm library in Python:

Here are quick steps for how the genetic algorithm works:

1. Initial Population– Initialize the population randomly based on the data.
2. Fitness function– Find the fitness value of the each of the chromosomes(a chromosome is a set of parameters which define a proposed solution to the problem that the genetic algorithm is trying to solve)
3. Selection– Select the best fitted chromosomes as parents to pass the genes for the next generation and create a new population
4. Cross-over– Create new set of chromosome by combining the parents and add them to new population set
5. Mutation– Perfrom mutation which alters one or more gene values in a chromosome in the new population set generated. Mutation helps in getting more diverse oppourtinity.Obtained population will be used in the next generation

I’m a sucker for genetic algorithms (and even more so its cousin, genetic programming). And there are still good use cases for genetic algorithms, especially in creating scoring functions for neural networks.

Comments closed

Comparing Recurrent and Convolutional Neural Networks

Sameer Nigam explains the differences between convolutional and recurrent neural networks:

CNN and RNN are amongst most important algorithm of Neural Network family, also they differ in their network process and solving problems.

So talking about their differences:

CNN are used to solve classification and regression problems and RNN are used to solve sequence information.

Read on to see what Sameer means by this.

Comments closed

Neural Networks and a Reproducibility Problem

William Vorhies looks at a recent paper on attempts at reproducing results from various types of neural networks:

Your trusted lead on recommenders rushes up with a new paper in hand.  Just back from the RecSys conference where the paper was presented he shows you the results.  It appears your top-N recommender could be made several percentage points better using the new technique.  The downside is that it would require you to adopt one of the new DNN collaborative filtering models which would be much more compute intensive and mean a deep reskilling dive for some of your team.

Would you be surprised to find out that the results in that paper are not reproducible?  Or more, that the baseline techniques to which it was compared to show improvements were not properly optimized.  And, if they had been, the much simpler techniques would be shown to be superior.

In a recent paper “Are We Really Making Much Progress”, researchers Maurizio Ferrari Dacrema, Paolo Cremonesi, Dietmar Jannach raise a major red flag.  Houston, we have a reproducibility problem.

Having worked through some of these papers for a different algorithm, I’m not that surprised. Sometimes it seems like improvements are limited solely to the data set and scenario the authors came up with, though that may just be the cynic in me.

This article is a good reason for looking at several types of models during the research phase, and even trying to keep several models up to date. It’s also a reminder that if you’re looking at papers and hot algorithms, make sure they include a way to get the data used in testing (and source code, if you can).

Comments closed

Distributed Model Training with Cloudera ML

Zuling Kang and Anand Patil show us how to train models across several nodes using Cloudera Machine Learning:

Deep learning models are generally trained using the stochastic gradient descendent (SGD) algorithm. For each iteration of SGD, we will sample a mini-batch from the training set, feed it into the training model, calculate the gradient of the loss function of the observed values and the real values, and update the model parameters (or weights). As it is well known that the SGD iterations have to be executed sequentially, it is not possible to speed up the training process by parallelizing iterations. However, as processing one single iteration for a number of commonly used models like CIFAR10 or IMAGENET takes a long time, even using the most sophisticated GPU, we can still try to parallelize the feedforward computation as well as the gradient calculation within each iteration to speed up the model training process.

In practice, we will split the mini-batch of the training data into several parts, like 4, 8, 16, etc. (in this article, we will use the term sub-batch to refer to these split parts), and each training worker takes one sub-batch. Then the training workers do feedforward, gradient computation, and model updating using the sub-batches, respectively, just as in the monolithic training mode. After these steps, a process called model average is invoked, averaging the model parameters of all the workers participating in the training, so as to make the model parameters exactly the same when a new training iteration begins. Then the new round of the training iteration starts again from the data sampling and splitting step.

Read on for the high-level explanation, followed by some Python code working in TensorFlow.

Comments closed