Press "Enter" to skip to content

Category: Machine Learning

MLOps with Azure Databricks and MLflow

Oliver Koernig walks us through some of the basics of MLOps using MLflow and Azure Databricks:

Most organizations today have a defined process to promote code (e.g. Java or Python) from development to QA/Test and production.  Many are using Continuous Integration and/or Continuous Delivery (CI/CD) processes and oftentimes are using tools such as Azure DevOps or Jenkins to help with that process. Databricks has provided many resources to detail how the Databricks Unified Analytics Platform can be integrated with these tools (see Azure DevOps IntegrationJenkins Integration). In addition, there is a Databricks Labs project – CI/CD Templates – as well as a related blog post that provides automated templates for GitHub Actions and Azure DevOps, which makes the integration much easier and faster.

When it comes to machine learning, though, most organizations do not have the same kind of disciplined process in place.

Read on for a demonstration of the process.

Comments closed

Projecting Defensive Back Trajectories with Sagemaker

Lin Lee Cheong, et al, relay some interesting research:

NFL’s Next Gen Stats (NGS) powered by AWS accurately captures player and ball data in real time for every play and every NFL game—over 300 million data points per season—through the extensive use of sensors in players’ pads and the ball. With this rich set of tracking data, NGS uses AWS machine learning (ML) technology to uncover deeper insights and develop a better understanding of various aspects and trends of the game. To date, NGS metrics have focused on helping fans better appreciate and understand the offense and defense in gameplay through the application of advanced analytics, particularly in the passing game. Thanks to tracking data, it’s possible to quantify the difficulty of passes, model expected yards after catch, and determine the value of various play outcomes. A logical next step with this analytical information is to evaluate quarterback decision-making, such as whether the quarterback has considered all eligible receivers and evaluated tradeoffs accurately.

To effectively model quarterback decision-making, we considered a few key metrics—mainly the probability of different events occurring on a pass, and the value of said events. A pass can result in three outcomes: completion, incompletion, or interception. NGS has already created models that provide probabilities of these outcomes, but these events rely on information that’s available at only two points during the play: when the ball is thrown (termed as pass-forward), and when the ball arrives to a receiver (pass-arrived). Because of this, creating accurate probabilities requires modeling the trajectory of players between those two points in time.

For these probabilities, the quarterback’s decision is heavily influenced by the quality of defensive coverage on various receivers, because a receiver with a closely covered defender has a lower likelihood of pass completion compared to a receiver who is wide open due to blown coverage. Furthermore, defenders are inherently reactive to how the play progresses. Defenses move in completely different ways depending on which receiver is targeted on the pass. This means that a trajectory model for defenders has to similarly be reactive to the specified targeted receiver in a believable manner.

Click through for details on the study.

Comments closed

Image Classification with Keras and TensorFlow 2 in R

Shirin Glander takes us through the task of image classification using TensorFlow version 2.2.0:

Recently, I have been getting a few comments on my old article on image classification with Keras, saying that they are getting errors with the code. And I have also gotten a few questions about how to use a Keras model to predict on new images (of different size). Instead of replying to them all individually, I decided to write this updated version using recent Keras and TensorFlow versions (all package versions and system information can be found at the bottom of this article, as usual).

Click through for the R code.

Comments closed

Implementing an LSTM Model with Python

Mrinal Walia takes us through the concept of Long Short Term Memory:

A simple Recurrent Neural Network has a very simple structure, that forms a chain of repeating modules of a neural network, with just a single activation function such as tanh layer, similarly LSTM too have a chain-like structure with repeating modules just like RNN but instead of a single Neural network layer in RNN, LSTM has four layers which are interacting in a very different way each performing its unique function in the network.

Read on for a good amount of theory followed by an example using Keras.

Comments closed

Generating Predictions with SQL Server ML Services

Jeffin Mathew walks us through SQL Server Machine Learning Services:

The purpose of this blog is to explore the process of running ML predictions on SQL server using Python. We are going to train and test the data to predict information about bike sharing for a specific year. We are going to be using the provided 2011 data and predict what 2012 will result in. The 2012 data already exists inside the dataset, so we will be able to compare the predicted to the actual amount.

For certain use cases—especially when the data already exists in SQL Server, and especially especially when you can use native scoring—Machine Learning Services does a great job.

Comments closed

MLOPS in R with GitHub Actions

David Smith explains MLOPS and GitHub actions in a talk:

In the talk, I demonstrate the process in action (the demo starts at the 14:30 mark in the video below). I used Visual Studio Code to edit the app.R file in repository, and then pushed the changes to GitHub. That immediately triggered the action to deploy the updated file via SSH to the Shiny Server, running in a remote VM. Similarly, changes to the data file or to the R script files implementing the logistic regression model would trigger the model to be retrained in the cluster, and re-deploy the endpoint to deliver new predictions from the updated model.

Click through for a quick summary, link to the repo, and embedded video of the talk.

Comments closed

The %tensorboard Magic Command in Databricks Notebooks

Jerry Liang and Hossein Falaki introduce a new magic command in Databricks Runtime 7.2:

In 2017, we released the  dbutils.tensorboard.start() API to manage and use TensorBoard inside Databricks python notebooks. This API only permits one active TensorBoard process on a cluster at any given time – which hinders multi-tenant use-cases. Early last year, TensorBoard released its own API for notebooks via the %tensorboard python magic command. This API not only starts TensorBoard processes but also exposes the TensorBoard’s command line arguments in the notebook environment. In addition, it embeds the TensorBoard UI inside notebooks, whereas the dbutils.tensorboard.start API prints a link to open TensorBoard in a new tab.

Read on to see how you can use it.

Comments closed

Neural Network Model Deployment with ONNX

Terry McCann gives us a primer on ONNX:

Let me introduce you to ONNX. ONNX or the Open Neural Network eXchange is a runtime which can take a model, that you have trained in PyTorch or Tensorflow and encapsulate it in an ONNX format which is executed on something running the ONNX runtime. This new model can be trained in Python and deployed on an ML.net application, with no need for integration coding.

We have spent a huge amount of time creating different Docker containers for different types of models, the Tensorflow container or the PyTorch container or a container running in our model in Spark, the list goes on. That way of working is kind of becoming defunct. ONNX really breaks that down into a simple standard runtime that you can start working with and you can deploy your model into multiple different environments and ensure that is runs on your database, on your website, on your mobile device and also at the edge.

Terry has a video as well. I like the fact that ONNX exists and also that it’s available in Azure SQL Edge (in part because I want it available on-premises as well).

Comments closed

ML Services and Resource Governor

I have a post on two gotchas you might run into around Resource Governor throttling SQL Server Machine Learning Services:

By default, SQL Server will grant 20% of available memory to any R or Python scripts running. The purpose of this limit is to prevent you from hurting server performance with expensive external scripts (like, say, training large neural networks on a SQL Server).

Here’s the kicker: this affects you even if you don’t have Resource Governor enabled. If you see out-of-memory exceptions in Python or error messages about memory allocation in R, I’d recommend bumping this max memory percent up above 20, and I have scripts to help you with the job. Of course, making this change assumes that your server isn’t stressed to the breaking point; if it is, you might simply want to offload that work somewhere else.

Click through for the other issue.

Comments closed

Stock Price Predictions with LSTM Models

Thenuja Shanthacumaran walks us through training a Long Short-Term Memory neural network model for predicting stock prices:

LSTM could not process a single data point. it needs a sequence of data for processing and able to store historical information. LSTM is an appropriate algorithm to make prediction and process based-on time-series data. It’s better to work on the regression problem.

The stock market has enormously historical data that varies with trade date, which is time-series data, but the LSTM model predicts future price of stock within a short-time period with higher accuracy when the dataset has a huge amount of data.

Click through for the process and a demo.

Comments closed