Press "Enter" to skip to content

Category: Machine Learning

An Intro to Vetiver in R

Colin Gillespie introduces an R package for MLOps:

Most R users are familiar with the classic workflow popularised by R for Data Science. Data scientists begin by importing and cleaning the data, then iteratively transform, model, and visualise it. Visualisation drives the modeling process, which in turn prompts new visualisations, and periodically, they summarise their work and report results.

Click through for a demonstration of how to create and deploy a model using vetiver.

Comments closed

Gradient Boosting for Classification

I have a new video:

In this video, I take a look at an alternative to bootstrap aggregation & random forest: boosting. We cover a brief history of boosting and see how it works in action with XGBoost and LightGBM.

This is probably the video with the single largest number of links in my show notes. It’s also one of the shortest in the series; it’s funny how things work out sometimes.

Comments closed

Monitoring ML Models in production

Thomas Sobolik and Leopold Boudard talk model drift:

Regardless of how much effort teams put into developing, training, and evaluating ML models before they deploy, their functionality inevitably degrades over time due to several factors. Unlike with conventional applications, even subtle trends in the production environment a model operates in can radically alter its behavior. This is especially true of more advanced models that use deep learning and other non-deterministic techniques. It’s not enough to track the health and throughput of your deployed ML service alone. In order to maintain the accuracy and effectiveness of your model, you need to continuously evaluate its performance and identify regressions so that you can retrain, fine-tune, and redeploy at an optimal cadence.

In this post, we’ll discuss key metrics and strategies for monitoring the functional performance of your ML models in production […]

Click through for the article. There’s a Datadog pitch at the end, but the info is useful regardless of which tool you’re using for monitoring.

Comments closed

Feature Engineering with Azure ML and Microsoft Fabric

Siliang Jiao, et al, talk architecture:

Feature engineering is the process of using domain knowledge to extract features (characteristics, properties, attributes) from raw data. The extracted features are used for training the models that can predict values for relevant business scenarios. A feature engineering system provides the tools, processes, and techniques used to perform feature engineering consistently and efficiently. 

This article elaborates on how to build a feature engineering system based on Azure Machine Learning managed feature store and Microsoft Fabric. 

Click through to see how the pieces fit together.

Comments closed

Plotting Training and Testing Results with tidyAML

Steven Sanderson builds a plot:

In the realm of machine learning, visualizing model predictions is essential for understanding the performance and behavior of our algorithms. When it comes to regression tasks, plotting predictions alongside actual values provides valuable insights into how well our model is capturing the underlying patterns in the data. With the plot_regression_predictions() function in tidyAML, this process becomes seamless and informative.

Read on to see how the function works and the kind of result you can expect from it.

Comments closed

tidyAML Updates

Steven Sanderson has been busy. First up, a post on tidyAML updates:

One of the standout features in this release is the addition of extract_regression_residuals(). This function empowers users to delve deeper into regression models, providing a valuable tool for analyzing and understanding residuals. Whether you’re fine-tuning your models or gaining insights into data patterns, this enhancement adds a crucial layer to your analytical arsenal.

Then, Steven goes into detail on .drap_na:

In the newest release of tidyAML there has been an addition of a new parameter to the functions fast_classification() and fast_regression(). The parameter is .drop_na and it is a logical value that defaults to TRUE. This parameter is used to determine if the function should drop rows with missing values from the output if a model cannot be built for some reason. Let’s take a look at the function and it’s arguments.

After that, we get to see an updated function:

In response to user feedback, we’ve enhanced the internal_make_wflw_predictions() function to provide a comprehensive set of predictions. Now, when you make a call to this function, it includes:

  1. The Actual Data: This is the real-world data that your model aims to predict. Having access to this information helps you assess how well your model is performing on unseen instances.
  2. Training Predictions: Predictions made on the training dataset. This is essential for understanding how well your model generalizes to the data it was trained on.
  3. Testing Predictions: Predictions made on the testing dataset. This is crucial for evaluating the model’s performance on data it hasn’t seen during the training phase.

You can also check out the package’s GitHub repository and see more.

Comments closed

Exporting to CSV in Azure ML Designer

Tom LaRock saves a file:

The most popular feature in any application is an easy-to-find button saying “Export to CSV.” If this button is not visibly available, a simple right-click of your mouse should present such an option. You really should not be forced to spend any additional time on this Earth looking for a way to export your data to a CSV file.

Well, in Azure ML Studio, exporting to a CSV file should be simple, but is not, unless you already know what you are doing and where to look. I was reminded of this recently, and decided to write a quick post in case a person new to ML Studio was wondering how to export data to a CSV file.

Click through for one false start and then the correct answer.

Comments closed

Sample Data in Azure ML Designer

Tom LaRock shows us where the hidden data is:

Recently I was working inside of Azure ML Studio and wanted to browse the sample datasets provided. Except I could not find them. I *knew* they existed, having used them previously, but could not remember if that was in the original ML Studio (classic) or not.

After some trial and error, I found them and decided to write this post in case anyone else is wondering where to find the sample datasets. You’re welcome, future Tom!

Click through to see where those sample datasets are. And yeah, they don’t get updated that frequently. And that’s probably a good thing, as it means when you run the demo two years after someone created it, you’ll still get predictable results.

Comments closed