Press "Enter" to skip to content

Category: Machine Learning

Using the Open Source R or Python Runtime with Machine Learning Services

Niels Berglund walks us through using the open source extensibility framework to install R or Python:

When Java became a supported language in SQL Server 2019, Microsoft mentioned that communication between ExternalHost and the language extension should be based on an API, regardless of the external language. The API is the Extensibility Framework API for SQL Server. Having an API ensures simplicity and ease of use for the extension developer.

From the paragraph above, one can assume that Microsoft would like to see 3rd party development of language extensions. That assumption turned out to be accurate as, mentioned above, Microsoft open-sourced the Java language extension, together with the include files for the extension API, in September 2020! This means that anyone interested can now create a language extension for their own favorite language!

However, open sourcing the Java extension was not the only thing Microsoft did. They also created and open-sourced language extensions for R and Python!

Click through for more detail and a walkthrough on installation of Python.

Comments closed

TF-IDF in .NET for Spark, Updated

Ed Elliott has been busy:

Apache Spark has had a machine learning API for quite some time and this has been partially implemented in .NET for Apache Spark.

In this post we will look at how we can use the Apache Spark ML API from .NET. This is the second version of this post, the first version was written before version 1 of .NET for Apache Spark and there was a vital piece of the implementation missing which meant although we could build the model in .NET, we couldn’t actually use it. The necessary functionality is now available and so I am updating the post. To see the previous version go to: https://the.agilesql.club/2020/07/tf-idf-in-.net-for-apache-spark-using-spark-ml/

Read on for more information, as well as a call to action.

Comments closed

MLOps with Azure Databricks and MLflow

Oliver Koernig walks us through some of the basics of MLOps using MLflow and Azure Databricks:

Most organizations today have a defined process to promote code (e.g. Java or Python) from development to QA/Test and production.  Many are using Continuous Integration and/or Continuous Delivery (CI/CD) processes and oftentimes are using tools such as Azure DevOps or Jenkins to help with that process. Databricks has provided many resources to detail how the Databricks Unified Analytics Platform can be integrated with these tools (see Azure DevOps IntegrationJenkins Integration). In addition, there is a Databricks Labs project – CI/CD Templates – as well as a related blog post that provides automated templates for GitHub Actions and Azure DevOps, which makes the integration much easier and faster.

When it comes to machine learning, though, most organizations do not have the same kind of disciplined process in place.

Read on for a demonstration of the process.

Comments closed

Projecting Defensive Back Trajectories with Sagemaker

Lin Lee Cheong, et al, relay some interesting research:

NFL’s Next Gen Stats (NGS) powered by AWS accurately captures player and ball data in real time for every play and every NFL game—over 300 million data points per season—through the extensive use of sensors in players’ pads and the ball. With this rich set of tracking data, NGS uses AWS machine learning (ML) technology to uncover deeper insights and develop a better understanding of various aspects and trends of the game. To date, NGS metrics have focused on helping fans better appreciate and understand the offense and defense in gameplay through the application of advanced analytics, particularly in the passing game. Thanks to tracking data, it’s possible to quantify the difficulty of passes, model expected yards after catch, and determine the value of various play outcomes. A logical next step with this analytical information is to evaluate quarterback decision-making, such as whether the quarterback has considered all eligible receivers and evaluated tradeoffs accurately.

To effectively model quarterback decision-making, we considered a few key metrics—mainly the probability of different events occurring on a pass, and the value of said events. A pass can result in three outcomes: completion, incompletion, or interception. NGS has already created models that provide probabilities of these outcomes, but these events rely on information that’s available at only two points during the play: when the ball is thrown (termed as pass-forward), and when the ball arrives to a receiver (pass-arrived). Because of this, creating accurate probabilities requires modeling the trajectory of players between those two points in time.

For these probabilities, the quarterback’s decision is heavily influenced by the quality of defensive coverage on various receivers, because a receiver with a closely covered defender has a lower likelihood of pass completion compared to a receiver who is wide open due to blown coverage. Furthermore, defenders are inherently reactive to how the play progresses. Defenses move in completely different ways depending on which receiver is targeted on the pass. This means that a trajectory model for defenders has to similarly be reactive to the specified targeted receiver in a believable manner.

Click through for details on the study.

Comments closed

Image Classification with Keras and TensorFlow 2 in R

Shirin Glander takes us through the task of image classification using TensorFlow version 2.2.0:

Recently, I have been getting a few comments on my old article on image classification with Keras, saying that they are getting errors with the code. And I have also gotten a few questions about how to use a Keras model to predict on new images (of different size). Instead of replying to them all individually, I decided to write this updated version using recent Keras and TensorFlow versions (all package versions and system information can be found at the bottom of this article, as usual).

Click through for the R code.

Comments closed

Implementing an LSTM Model with Python

Mrinal Walia takes us through the concept of Long Short Term Memory:

A simple Recurrent Neural Network has a very simple structure, that forms a chain of repeating modules of a neural network, with just a single activation function such as tanh layer, similarly LSTM too have a chain-like structure with repeating modules just like RNN but instead of a single Neural network layer in RNN, LSTM has four layers which are interacting in a very different way each performing its unique function in the network.

Read on for a good amount of theory followed by an example using Keras.

Comments closed

Generating Predictions with SQL Server ML Services

Jeffin Mathew walks us through SQL Server Machine Learning Services:

The purpose of this blog is to explore the process of running ML predictions on SQL server using Python. We are going to train and test the data to predict information about bike sharing for a specific year. We are going to be using the provided 2011 data and predict what 2012 will result in. The 2012 data already exists inside the dataset, so we will be able to compare the predicted to the actual amount.

For certain use cases—especially when the data already exists in SQL Server, and especially especially when you can use native scoring—Machine Learning Services does a great job.

Comments closed

MLOPS in R with GitHub Actions

David Smith explains MLOPS and GitHub actions in a talk:

In the talk, I demonstrate the process in action (the demo starts at the 14:30 mark in the video below). I used Visual Studio Code to edit the app.R file in repository, and then pushed the changes to GitHub. That immediately triggered the action to deploy the updated file via SSH to the Shiny Server, running in a remote VM. Similarly, changes to the data file or to the R script files implementing the logistic regression model would trigger the model to be retrained in the cluster, and re-deploy the endpoint to deliver new predictions from the updated model.

Click through for a quick summary, link to the repo, and embedded video of the talk.

Comments closed