Category: Machine Learning

Visualizing SHAP Values in R with shapviz

Published 2022-06-14 by Kevin Feasel

SHAP (SHapley Additive exPlanations, Lundberg and Lee, 2017) is an ingenious way to study black box models. SHAP values decompose – as fair as possible – predictions into additive feature contributions.
When it comes to SHAP, the Python implementation is the de-facto standard. It not only offers many SHAP algorithms, but also provides beautiful plots. In R, the situation is a bit more confusing. Different packages contain implementations of SHAP algorithms

Read on to see how shapviz works, how to install it, and the types of visuals you can create from it.

Comments closed

The Value of MLOps

Published 2022-06-07 by Kevin Feasel

Tori Tompkins explains what MLOps is and why it’s valuable:

A ML project will typically begin in an ‘Explore Phase’ where a data scientist or team of data scientists will explore the data they currently have and experiment with models, algorithms, parameters and features. MLOps at this stage is responsible for supplying Data Scientists with environment they need to achieve this. One way this can be done is by leveraging Feature Store.
A feature store is a tool for storing commonly used features. As data scientists create new features then can log these into feature stores such as Feast and Databricks Feature Store, they can reuse these features across teams and projects. This will benefit teams in multiple ways by reducing compute times for both training and inference, provide consistency in common features and reducing effort for create complex logic.

Read on for information about all six phases.

Comments closed

ML Algorithms a Poor Fit for Predictive Caches

Published 2022-06-02 by Kevin Feasel

Pete Warden describes an interesting phenomenon:

I’ve been working on a new research paper, and a friend gave me the feedback that he was confused by the statement “memory accesses can be accurately predicted at the compilation stage” for machine learning workloads, and that this made them a poor fit for conventional processor architectures with predictive caches. I realized that this was received wisdom among the ML engineers I know, but I wasn’t aware of any papers that discuss this point. I put out a request for help on Twitter, but while there were a lot of interesting resources in the answers, I still couldn’t find any papers that focused on what feels like an important property for machine learning systems. With that in mind, I wanted to at least describe the issue as best as I can in this blog post, so there’s a trail of breadcrumbs for anyone else interested in how system designs might need to change to accommodate ML.

Read on for the explanation. My reading here is that this is a downside to having general-purpose compute: you run the risk of sub-optimal performance in certain circumstances, like training models using certain types of ML algorithms.

Comments closed

Consuming an Azure ML AutoML Model in Excel

Published 2022-06-01 by Kevin Feasel

Lewis Prince needs to do some heavy lifting in Excel:

It has come back to my turn to write a blog post, and if you remember my previous one concerned why you should use Azure based AutoMl and subsequently how to do so. If you followed that then you will be left with a model of which you’ve scored and know the performance of, but no way of how to then deploy and use your model. I will outline the steps needed to do this (which involves a major shortcut as we are using an AutoMl model), and then show you the required VBA needed to consume this in Microsoft Excel.

Read on to see how you can do this. Back in the really old Azure ML days, you could download an Excel workbook which would have things set up and you could feed in a bunch of input data and get predictions.

Comments closed

Model Deployment Options in Azure

Published 2022-05-30 by Kevin Feasel

Tori Tompkins enumerates ways to deploy machine learning models in Azure:

There are so many options to deploy models in Azure that is can get quite overwhelming. In this blog, we break down all the available options and consider the pros and cons of each tooling option.

Even with those, there are other approaches as well, like hosting Spark-based models in Azure Synapse Analytics, using SQL Server Machine Learning Services on an Azure SQL Managed Instance or VM running SQL Server, etc.

Comments closed

Protecting ML Models and IP

Published 2022-05-18 by Kevin Feasel

Pete Warden has some advice:

Over the last decade I’ve helped hundreds of product teams ship ML-based products, inside and outside of Google, and one of the most frequent questions I got was “How do I protect my models?”. This usually came from executives, and digging deeper it became clear they were most worried about competitors gaining an advantage from what we released. This worry is completely understandable, because modern machine learning has become essential for many applications so quickly that best practices haven’t had time to settle and spread. The answers are complex and depend to some extent on your exact threat models, but if you want a summary of the advice I usually give it boils down to:
– Treat your training data like you do your traditional source code.
-Treat your model files like compiled executables.

Read on to see why Pete came to this as the appropriate answer, as well as what I have to consider a sly mention of duck boat tours.

Comments closed

Low-Code Churn Prediction with Synapse Analytics

Published 2022-05-18 by Kevin Feasel

Gavita Regunath shows off a capability in Azure Synapse Analytics:

We will build a machine learning solution to predict churn using Azure Synapse Analytics and Azure Machine Learning.
Azure Synapse Analytics is Microsoft’s limitless analytics platform that combines enterprise data warehousing and big data analytics. In simple terms, it is a one-stop-shop that allows you to ingest, prepare, and manage data that can then be used for machine learning and business intelligence, all from a single place. It provides a unified platform and encourages collaboration between data and machine learning professionals.
This article will show you how to build an end-to-end solution to train a machine learning model from Azure Synapse analytics using AutoML functionality within Azure Machine Learning. Using the T-SQL Predict statement, we can then use the trained machine model to make predictions against the churn dataset stored in the SQL Pool table. One of the key benefits of working from within Azure Synapse is that all the necessary steps required to train and make predictions with the trained model can be done from a single platform, Azure Synapse.

Click through for the three-step process and a demonstration.

Comments closed

Trying out AutoML in R

Published 2022-05-04 by Kevin Feasel

JLaw calls a timeout:

In this fourth (and hopefully final) entry in my “Icing the Kicker” series of posts, I’m going to jump back to the first post where I used tidymodels to predict whether or not a kick attempt would be iced. However, this time I see if using the h2o AutoML feature and the SuperLearner package can improve the predictive performance of my initial model.

The results are just about what I would have expected: they provide a good floor but a human with knowledge of the data and skill with techniques can still beat out-of-the-box AutoML processes. Still, knowing what that floor is can help a lot: run some AutoML tool for a few minutes/hours/days and you have an easy way of letting the business side know the expected model quality. If AutoML already exceeds expectations, you’re golden. If AutoML is close to expectations (on either end, just above or just below), you as a skilled human should be able to improve things a bit more, especially once you have a chance to analyze what the AutoML processes did. If AutoML is way below business expectations of quality, perhaps this isn’t the best project to spend time on. H/T R-Bloggers.

Comments closed

Quantifying Model Uncertainty with Tensorflow Probability

Published 2022-04-29 by Kevin Feasel

Vini Jaiswal reviews the Tensorflow Probability library:

In this blog, we look at the topic of uncertainty quantification for machine learning and deep learning. By no means is this a new subject, but the introduction of tools such as Tensorflow Probability and Pyro have made it easy to perform probabilistic modeling to streamline uncertainty calculations. Consider the scenario in which we predict the value of an asset like a house, based on a number of features, to drive purchasing decisions. Wouldn’t it be beneficial to know how certain we are of these predicted prices? Tensorflow Probability allows you to use the familiar Tensorflow syntax and methodology but adds the ability to work with distributions. In this introductory post, we leave the priors and the Bayesian treatment behind and opt for a simpler probabilistic treatment to illustrate the basic principles. We use the likelihood principle to illustrate how an uncertainty measure can be obtained along with predicted values by applying them to a deep learning regression problem.

Read on for an interesting explanation and tutorial.

Comments closed

Custom Model Evaluation Metrics with MLflow

Published 2022-04-22 by Kevin Feasel

Mark Zhang shows off a new bit of functionality in MLflow:

According to an internal customer survey, 75% of respondents say they frequently or always use specialized, business-focused metrics in addition to basic ones like accuracy and loss. Data scientists often utilize these custom metrics as they are more descriptive of business objectives (e.g. conversion rate), and contain additional heuristics not captured by the model prediction itself.
In this blog, we introduce an easy and convenient way of evaluating MLflow models on user-defined custom metrics. With this functionality, a data scientist can easily incorporate this logic at the model evaluation stage and quickly determine the best-performing model without further downstream analysis

Click through to see how to use built-in metrics but also how to create your own.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31