Press "Enter" to skip to content

Category: Machine Learning

Choosing an ML Algorithm

Hui Li developed a flow for determining appropriate machine learning algorithms:

Since the cheat sheet is designed for beginner data scientists and analysts, we will make some simplified assumptions when talking about the algorithms.

The algorithms recommended here result from compiled feedback and tips from several data scientists and machine learning experts and developers. There are several issues on which we have not reached an agreement and for these issues we try to highlight the commonality and reconcile the difference.

Additional algorithms will be added in later as our library grows to encompass a more complete set of available methods.

Read the whole thing.

Comments closed

Writing a Python Language Extension for ML Services

Niels Berglund shows how you can bring your own Python 3.9 runtime to SQL Server Machine Learning Services:

When I wrote we’d look at it in a future post I thought to myself; “how hard can it be?”. I had read the steps of how to build a Python language extension for Windows here, and it didn’t seem that hard: some Boost, CMake, compile, and Bob’s your uncle! Well, it turned out it was somewhat more complicated than what I anticipated. So, if you are interested – read on!

I was going to say that the steps seem a bit complicated but not overly terrible, though Niels’s conclusion leaves me wondering.

Comments closed

Cross-Validation in Azure ML Studio

Dinesh Asanka takes us through the cross-validation component in Azure ML Studio:

Let us look at implementing Cross-Validation in Azure Machine Learning. Let us use the sample Adventure Works database that we used for all the articles.

Then Cross Validate Model is dragged and dropped to the experiment. The Cross Validate model has two inputs and two outputs. Two inputs are data input and the relation to the Machine Learning technique. Let us use the Two-Class Decision Jungle as the Machine Learning Technique. Then the first output is connected to the Evaluate Model as shown in the following figure:

Click through for the process.

Comments closed

Using the Open Source R or Python Runtime with Machine Learning Services

Niels Berglund walks us through using the open source extensibility framework to install R or Python:

When Java became a supported language in SQL Server 2019, Microsoft mentioned that communication between ExternalHost and the language extension should be based on an API, regardless of the external language. The API is the Extensibility Framework API for SQL Server. Having an API ensures simplicity and ease of use for the extension developer.

From the paragraph above, one can assume that Microsoft would like to see 3rd party development of language extensions. That assumption turned out to be accurate as, mentioned above, Microsoft open-sourced the Java language extension, together with the include files for the extension API, in September 2020! This means that anyone interested can now create a language extension for their own favorite language!

However, open sourcing the Java extension was not the only thing Microsoft did. They also created and open-sourced language extensions for R and Python!

Click through for more detail and a walkthrough on installation of Python.

Comments closed

TF-IDF in .NET for Spark, Updated

Ed Elliott has been busy:

Apache Spark has had a machine learning API for quite some time and this has been partially implemented in .NET for Apache Spark.

In this post we will look at how we can use the Apache Spark ML API from .NET. This is the second version of this post, the first version was written before version 1 of .NET for Apache Spark and there was a vital piece of the implementation missing which meant although we could build the model in .NET, we couldn’t actually use it. The necessary functionality is now available and so I am updating the post. To see the previous version go to: https://the.agilesql.club/2020/07/tf-idf-in-.net-for-apache-spark-using-spark-ml/

Read on for more information, as well as a call to action.

Comments closed

MLOps with Azure Databricks and MLflow

Oliver Koernig walks us through some of the basics of MLOps using MLflow and Azure Databricks:

Most organizations today have a defined process to promote code (e.g. Java or Python) from development to QA/Test and production.  Many are using Continuous Integration and/or Continuous Delivery (CI/CD) processes and oftentimes are using tools such as Azure DevOps or Jenkins to help with that process. Databricks has provided many resources to detail how the Databricks Unified Analytics Platform can be integrated with these tools (see Azure DevOps IntegrationJenkins Integration). In addition, there is a Databricks Labs project – CI/CD Templates – as well as a related blog post that provides automated templates for GitHub Actions and Azure DevOps, which makes the integration much easier and faster.

When it comes to machine learning, though, most organizations do not have the same kind of disciplined process in place.

Read on for a demonstration of the process.

Comments closed

Projecting Defensive Back Trajectories with Sagemaker

Lin Lee Cheong, et al, relay some interesting research:

NFL’s Next Gen Stats (NGS) powered by AWS accurately captures player and ball data in real time for every play and every NFL game—over 300 million data points per season—through the extensive use of sensors in players’ pads and the ball. With this rich set of tracking data, NGS uses AWS machine learning (ML) technology to uncover deeper insights and develop a better understanding of various aspects and trends of the game. To date, NGS metrics have focused on helping fans better appreciate and understand the offense and defense in gameplay through the application of advanced analytics, particularly in the passing game. Thanks to tracking data, it’s possible to quantify the difficulty of passes, model expected yards after catch, and determine the value of various play outcomes. A logical next step with this analytical information is to evaluate quarterback decision-making, such as whether the quarterback has considered all eligible receivers and evaluated tradeoffs accurately.

To effectively model quarterback decision-making, we considered a few key metrics—mainly the probability of different events occurring on a pass, and the value of said events. A pass can result in three outcomes: completion, incompletion, or interception. NGS has already created models that provide probabilities of these outcomes, but these events rely on information that’s available at only two points during the play: when the ball is thrown (termed as pass-forward), and when the ball arrives to a receiver (pass-arrived). Because of this, creating accurate probabilities requires modeling the trajectory of players between those two points in time.

For these probabilities, the quarterback’s decision is heavily influenced by the quality of defensive coverage on various receivers, because a receiver with a closely covered defender has a lower likelihood of pass completion compared to a receiver who is wide open due to blown coverage. Furthermore, defenders are inherently reactive to how the play progresses. Defenses move in completely different ways depending on which receiver is targeted on the pass. This means that a trajectory model for defenders has to similarly be reactive to the specified targeted receiver in a believable manner.

Click through for details on the study.

Comments closed

Image Classification with Keras and TensorFlow 2 in R

Shirin Glander takes us through the task of image classification using TensorFlow version 2.2.0:

Recently, I have been getting a few comments on my old article on image classification with Keras, saying that they are getting errors with the code. And I have also gotten a few questions about how to use a Keras model to predict on new images (of different size). Instead of replying to them all individually, I decided to write this updated version using recent Keras and TensorFlow versions (all package versions and system information can be found at the bottom of this article, as usual).

Click through for the R code.

Comments closed