Press "Enter" to skip to content

Category: Machine Learning

A Survey of Predictive Analytics Techniques

Akmal Chaudhri tries a bunch of things:

In this short article, we’ll explore loan approvals using a variety of tools and techniques. We’ll begin by analyzing loan data and applying Logistic Regression to predict loan outcomes. Building on this, we’ll integrate BERT for Natural Language Processing to enhance prediction accuracy. To interpret the predictions, we’ll use SHAP and LIME explanation frameworks, providing insights into feature importance and model behavior. Finally, we’ll explore the potential of Natural Language Processing through LangChain to automate loan predictions, using the power of conversational AI.

Click through for the notebook, as well as an overview of what the notebook includes. I don’t particularly like word clouds as the “solution” in the BERT example, though without real data to perform any sort of NLP, there’s not much you can meaningfully do.

Comments closed

Referencing a Microsoft Fabric ML Model from another Workspace

Sandeep Pawar crosses workspaces:

I have written a couple of blogs about working with ML models in Microsoft Fabric. Creating experiments and logging and scoring models in Fabric is very easy, thanks to the built-in MLflow integration. However, the Fabric Data Science experience has one limitation. There are no model endpoints yet, and you cannot load a model from another workspace because the model URI, unlike in Databricks, does not reference a workspace. If you use MLFlowTransformer as shown in this blog, only the model from the workspace where the notebook is hosted is loaded. However, there is a workaround.

Read on for that workaround, as well as the core limitation associated with it.

Comments closed

Speech to Text with Streamlit and Azure AI

I have a new video:

In this video, I show how we can integrate with the Azure AI Services Speech service, using two different methods to capture speech from the microphone via our Streamlit application and submit that to Azure OpenAI.

Check out the video and final set of code. There’s an intermediate set of code for detecting a single utterance. But I think the final product works out pretty well.

Comments closed

Tips for Hyperparameter Tuning

Bala Priya C shares some tips and techniques:

If you’re familiar with machine learning, you know that the training process allows the model to learn the optimal values for the parameters—or model coefficients—that characterize it. But machine learning models also have a set of hyperparameters whose values you should specify when training the model. So how do you find the optimal values for these hyperparameters?

You can use hyperparameter tuning to find the best values for the hyperparameters. By systematically adjusting hyperparameters, you can optimize your models to achieve the best possible results.

This tutorial provides practical tips for effective hyperparameter tuning—starting from building a baseline model to using advanced techniques like Bayesian optimization. Whether you’re new to hyperparameter tuning or looking to refine your approach, these tips will help you build better machine learning models. Let’s get started.

Read on for those techniques. Incidentally, one of my “Old man yells at clouds” takes is that I dislike the existence of hyperparameters and consider them a modeling failure, essentially telling the implementer to do part of the researcher’s work. Knowing that they are necessary to work with for so many algorithms, there’s nothing to do but learn how to work with them effectively, but there’s a feel of outsourcing the hard work to users that I don’t like about the process. For that reason, I have extra respect for algorithms that neither need nor offer hyperparameters.

Comments closed

AutoML in Python with TPOT

Abid Ali Awan gives us a primer on TPOT:

AutoML is a tool designed for both technical and non-technical experts. It simplifies the process of training machine learning models. All you have to do is provide it with the dataset, and in return, it will provide you with the best-performing model for your use case. You don’t have to code for long hours or experiment with various techniques; it will do everything on its own for you.

In this tutorial, we will learn about AutoML and TPOT, a Python AutoML tool for building machine learning pipelines. We will also learn to build a machine learning classifier, save the model, and use it for model inference.

Click through to see an example of how to use the library.

Comments closed

An Intro to Vetiver in R

Colin Gillespie introduces an R package for MLOps:

Most R users are familiar with the classic workflow popularised by R for Data Science. Data scientists begin by importing and cleaning the data, then iteratively transform, model, and visualise it. Visualisation drives the modeling process, which in turn prompts new visualisations, and periodically, they summarise their work and report results.

Click through for a demonstration of how to create and deploy a model using vetiver.

Comments closed

Gradient Boosting for Classification

I have a new video:

In this video, I take a look at an alternative to bootstrap aggregation & random forest: boosting. We cover a brief history of boosting and see how it works in action with XGBoost and LightGBM.

This is probably the video with the single largest number of links in my show notes. It’s also one of the shortest in the series; it’s funny how things work out sometimes.

Comments closed

Monitoring ML Models in production

Thomas Sobolik and Leopold Boudard talk model drift:

Regardless of how much effort teams put into developing, training, and evaluating ML models before they deploy, their functionality inevitably degrades over time due to several factors. Unlike with conventional applications, even subtle trends in the production environment a model operates in can radically alter its behavior. This is especially true of more advanced models that use deep learning and other non-deterministic techniques. It’s not enough to track the health and throughput of your deployed ML service alone. In order to maintain the accuracy and effectiveness of your model, you need to continuously evaluate its performance and identify regressions so that you can retrain, fine-tune, and redeploy at an optimal cadence.

In this post, we’ll discuss key metrics and strategies for monitoring the functional performance of your ML models in production […]

Click through for the article. There’s a Datadog pitch at the end, but the info is useful regardless of which tool you’re using for monitoring.

Comments closed

Feature Engineering with Azure ML and Microsoft Fabric

Siliang Jiao, et al, talk architecture:

Feature engineering is the process of using domain knowledge to extract features (characteristics, properties, attributes) from raw data. The extracted features are used for training the models that can predict values for relevant business scenarios. A feature engineering system provides the tools, processes, and techniques used to perform feature engineering consistently and efficiently. 

This article elaborates on how to build a feature engineering system based on Azure Machine Learning managed feature store and Microsoft Fabric. 

Click through to see how the pieces fit together.

Comments closed