Press "Enter" to skip to content

Category: Machine Learning

Low-Code Churn Prediction with Synapse Analytics

Gavita Regunath shows off a capability in Azure Synapse Analytics:

We will build a machine learning solution to predict churn using Azure Synapse Analytics and Azure Machine Learning.

Azure Synapse Analytics is Microsoft’s limitless analytics platform that combines enterprise data warehousing and big data analytics. In simple terms, it is a one-stop-shop that allows you to ingest, prepare, and manage data that can then be used for machine learning and business intelligence, all from a single place. It provides a unified platform and encourages collaboration between data and machine learning professionals.

This article will show you how to build an end-to-end solution to train a machine learning model from Azure Synapse analytics using AutoML functionality within Azure Machine Learning. Using the T-SQL Predict statement, we can then use the trained machine model to make predictions against the churn dataset stored in the SQL Pool table. One of the key benefits of working from within Azure Synapse is that all the necessary steps required to train and make predictions with the trained model can be done from a single platform, Azure Synapse.

Click through for the three-step process and a demonstration.

Comments closed

Trying out AutoML in R

JLaw calls a timeout:

In this fourth (and hopefully final) entry in my “Icing the Kicker” series of posts, I’m going to jump back to the first post where I used tidymodels to predict whether or not a kick attempt would be iced. However, this time I see if using the h2o AutoML feature and the SuperLearner package can improve the predictive performance of my initial model.

The results are just about what I would have expected: they provide a good floor but a human with knowledge of the data and skill with techniques can still beat out-of-the-box AutoML processes. Still, knowing what that floor is can help a lot: run some AutoML tool for a few minutes/hours/days and you have an easy way of letting the business side know the expected model quality. If AutoML already exceeds expectations, you’re golden. If AutoML is close to expectations (on either end, just above or just below), you as a skilled human should be able to improve things a bit more, especially once you have a chance to analyze what the AutoML processes did. If AutoML is way below business expectations of quality, perhaps this isn’t the best project to spend time on. H/T R-Bloggers.

Comments closed

Quantifying Model Uncertainty with Tensorflow Probability

Vini Jaiswal reviews the Tensorflow Probability library:

In this blog, we look at the topic of uncertainty quantification for machine learning and deep learning. By no means is this a new subject, but the introduction of tools such as Tensorflow Probability and Pyro have made it easy to perform probabilistic modeling to streamline uncertainty calculations. Consider the scenario in which we predict the value of an asset like a house, based on a number of features, to drive purchasing decisions. Wouldn’t it be beneficial to know how certain we are of these predicted prices? Tensorflow Probability allows you to use the familiar Tensorflow syntax and methodology but adds the ability to work with distributions. In this introductory post, we leave the priors and the Bayesian treatment behind and opt for a simpler probabilistic treatment to illustrate the basic principles. We use the likelihood principle to illustrate how an uncertainty measure can be obtained along with predicted values by applying them to a deep learning regression problem.

Read on for an interesting explanation and tutorial.

Comments closed

Custom Model Evaluation Metrics with MLflow

Mark Zhang shows off a new bit of functionality in MLflow:

According to an internal customer survey, 75% of respondents say they frequently or always use specialized, business-focused metrics in addition to basic ones like accuracy and loss. Data scientists often utilize these custom metrics as they are more descriptive of business objectives (e.g. conversion rate), and contain additional heuristics not captured by the model prediction itself.

In this blog, we introduce an easy and convenient way of evaluating MLflow models on user-defined custom metrics. With this functionality, a data scientist can easily incorporate this logic at the model evaluation stage and quickly determine the best-performing model without further downstream analysis

Click through to see how to use built-in metrics but also how to create your own.

Comments closed

Iteratively Tuning Graph Neural Networks

Luis Bermudez takes us through the process of tuning one flavor of neural network:

We made our own implementations of OGB leaderboard entries for two popular GNN frameworks: GraphSAGE and a Relational Graph Convolutional Network (RGCN). We then designed and executed an iterative experimentation approach for hyperparameter tuning where we seek a quality model that takes minimal time to train. We define quality by running an unconstrained performance tuning loop, and use the results to set thresholds in a constrained tuning loop that optimizes for training efficiency.

Read on to see how they did it.

Comments closed

Topic Modeling with Python

Sanil Mhatre takes us through topic modeling:

Topic modeling is a powerful Natural Language Processing technique for finding relationships among data in text documents. It falls under the category of unsupervised learning and works by representing a text document as a collection of topics (set of keywords) that best represent the prevalent contents of that document. This article will focus on a probabilistic modeling approach called Latent Dirichlet Allocation (LDA), by walking readers through topic modeling using the team health demo dataset. Demonstrations will use Python and a Jupyter notebook running on Anaconda. Please follow instructions from the “Initial setup” section of the previous article to install Anaconda and set up a Jupyter notebook.

The second article of this series, Text Mining and Sentiment Analysis: Power BI Visualizations, introduced readers to the Word Cloud, a common technique to represent the frequency of keywords in a body of text. Word Cloud is an image composed of keywords found within a body of text, where the size of each word indicates its frequency in that body of text. This technique is limited in its ability to discover underlying topics and themes in the text, because it only relies on the frequency of keywords to determine their popularity. Topic modeling overcomes these limitations and uncovers deeper insights from text data using statistical modeling for discovering the topics (collection of words) that occur in text documents.

Read on for an informative article with plenty of code.

Comments closed

An Overview of Automatic Text Summarization

Kevin Jacobs looks at the state of the art with respect to automatic text summarization:

Automatic text summarization comes in two flavours: extractive summarization and abstractive summarization. Extractive summarization models take exact phrases from the reference documents and use them as a summary. One of the very first research papers on (extractive) text summarization is the work of Luhn [1]. TextRank [2] (based on the concepts used by the PageRank algorithm) is another widely used extractive summarization model.

In the era of deep learning, abstractive summarization became a reality. With abstractive summarization, a model generates a text instead of using literal phrases of the reference documents. One of the more recent works on abstractive summarization is PEGASUS [3] (a demo is available at HuggingFace).

Click through for a couple contemporary examples as well as a few pain points you can experience when using the current set of libraries and algorithms.

Comments closed

A Conceptual Discussion of Active Learning

Kevin Jacobs teaches us to learn:

Active Learning is a method in which data is annotated in s smart way. With data annotation, you would normally get to see a randomly selected item which you need to label. This however can lead to a lot of repetition of similar items which you have to label. This is a waste of time. A better way would be to use Active Learning. For Active Learning, a batch of random items is selected first. Then, a lightweight classifier is used for evaluating the previously annotated data.

Basically, run your prediction mechanism, find the things about which the mechanism is least certain, and figure those out. Doing this reduces ambiguity and quickly leads to a better model.

Comments closed

Building a Recommender in Spark

Avinash Sooriyarachchi makes a recommendation:

There has been an exponential increase in the volume and variety of data at our disposal to build recommenders and notable advances in compute and algorithms to utilize in the process. Particularly, the means to store, process and learn from image data has dramatically increased in the past several years. This allows retailers to go beyond simple collaborative filtering algorithms and utilize more complex methods, such as image classification and deep convolutional neural networks, that can take into account the visual similarity of items as an input for making recommendations. This is especially important given online shopping is a largely visual experience and many consumer goods are judged on aesthetics.

In this article, we’ll change the script and show the end-to-end process for training and deploying an image-based similarity model that can serve as the foundation for a recommender system. Furthermore, we’ll show how the underlying distributed compute available in Databricks can help scale the training process and how foundational components of the Lakehouse, Delta Lake and MLflow, can make this process simple and reproducible.

Click through for the process.

Comments closed

Scoring Azure ML Models in Azure Synapse Analytics

Alex Aleksandrov shows off the PREDICT operator:

We can use Synapse for many activities. We can use it not only for ingesting, querying, storing and visualising data, but for developing machine learning models as well. Of course, one can say that doing data science is another functionality of this platform and this is definitely true. However, in this article, I would like to show you that instead of using Python, one can use T-SQL for doing predictions.

Click through to see how.

Comments closed