Press "Enter" to skip to content

Category: Machine Learning

Azure ML Deployments and Endpoints

I continue a series on low-code machine learning with Azure ML:

The first thing we need to do is create an inference pipeline. Inference pipelines differ from training pipelines in that they won’t use the training dataset, but they will accept user input and provide a scored response. There are two types of inference pipeline: real-time and batch. Real-time inference pipelines are intended for small-set work. We’ll host a service on some compute resource in Azure and people will make REST API calls to our service, sending in a request with a few items to score and we send back classification results.

By contrast, a batch pipeline is what you’d use if you have a nightly job with tens of millions of items to score. In that case, the typical pattern is to have a service listening for changes in a storage account and, some time after people drop new files into the proper folder, the batch inference process will pick up these files, score the results, and write those results out to a destination location.

This post is all about inference pipelines. The next post will be all about batch pipelines.

Comments closed

MLOps on Databricks

Piotr Majer and Michael Shtelma complete a series on MLOps on Databricks:

This is the second part of a two-part series of blog posts that show an end-to-end MLOps framework on Databricks, which is based on Notebooks. In the first post, we presented a complete CI/CD framework on Databricks with notebooks. The approach is based on the Azure DevOps ecosystem for the Continuous Integration (CI) part and Repos API for the Continuous Delivery (CD). This post extends the presented CI/CD framework with machine learning providing a complete ML Ops solution.

Check it out.

Comments closed

Training a Model in the Azure ML Designer

I continue a series on low-code machine learning in Azure ML:

Machine learning is a lot like an action film from the 1980s: we see early on that there’s a problem, we train in a cool montage with upbeat rock music, and then we come back to the problem and defeat it with car chases and bazookas and quippy one-liners. Well, maybe that simile got away from me a little bit, but I think I’ll stick with it.

What we’ll do in this post is cover the process of training a simple model using the Azure ML designer. I won’t deviate too far from the “classic” Azure ML script, which involves using the Designer to train a model and then deploy an endpoint for consumption. And away we go!

Sometimes, when a model is running, I say to it, “I have to remind you Sully, this is my weak arm!”

Comments closed

Trying Automated ML in Azure ML

I continue a series on low-code machine learning with Azure ML:

Automated Machine Learning (AutoML) provides two distinct benefits. The first benefit is the one that AutoML providers tend to tout: you don’t need (much) machine learning experience to use them. According to the marketing, AutoML does all of the work and you sit back and enjoy the fruits of its labor.

I am nowhere near sold on this use case for AutoML. Yes, you can get answers in a few clicks, but to get good answers, you need a lot more knowledge of data processing and statistics than they let on. Feeding in garbage data will get you mediocre results.

Click through for the second benefit, which I think applies much better. Also for a step-by-step demonstration of how AutoML works.

Comments closed

Data and Compute in Azure ML

I continue a series on low-code machine learning with Azure ML:

Once you have a datastore, you’re going to want to create at least one dataset. Datasets are versioned collections of data in some datastore. The Azure ML model is quite file-centric, and this concept makes the most sense with something like a data lake, where we have different extracts of data over different timeframes. Perhaps we get an extract of customer behavior up to the year 2018, and then the next year we get customer behavior up to 2019, and so on. The idea here is that you can use the latest training data for your models, but if you want to see how current models would have stacked up against older data, the opportunity is there.

Once you have data and compute, the world is your oyster. Or something like that.

Comments closed

Apache Flink ML 2.0.0

Dong Lin and Yun Gao make an announcement:

The Apache Flink community is excited to announce the release of Flink ML 2.0.0! Flink ML is a library that provides APIs and infrastructure for building stream-batch unified machine learning algorithms, that can be easy-to-use and performant with (near-) real-time latency.

This release involves a major refactor of the earlier Flink ML library and introduces major features that extend the Flink ML API and the iteration runtime, such as supporting stages with multi-input multi-output, graph-based stage composition, and a new stream-batch unified iteration library. Moreover, we added five algorithm implementations in this release, which is the start of a long-term initiative to provide a large number of off-the-shelf algorithms in Flink ML with state-of-the-art performance.

Congratulations to everybody who contributed to the project; it’s a big milestone.

Comments closed

Enhancing Color Photographs via Generative Adversarial Networks

Neil Saunders re-colorizes photographs:

When I’m not at the computer writing R code, I can often be found at the computer processing photographs. Or at the computer browsing Twitter, which is how I came across Stuart Humphryes, a digital artist who enhances autochromes. Autochromes are early colour photographs, generated using a process patented by the Lumière brothers in 1903. You can find and download many examples of them online. Stuart uses a variety of software tools to clean, enhance and balance the colours, resulting in bright vivid images that often have a contemporary feel, whilst at the same time retaining the somewhat “dreamy” quality of the original.

Having read that one of his tools uses neural networks, I was keen to discover how easy it is to achieve something similar using freely-available software found online. The answer is “quite easy” – although achieving results as good as Stuart’s is somewhat more difficult. Here’s how I went about it.

Click through for the process and some really nice-looking post-production photographs.

Comments closed

The Continuing Relevance of Feature Engineering

Pete Warden points out something which is obvious and still needs to be said:

One of the most exciting aspects of deep learning’s emergence in computer vision a few years ago was that it didn’t appear to require any feature engineering, unlike previous techniques like histograms-of-gradients or Haar cascades. As neural networks ate up other fields like NLP and speech, the hope was that feature engineering would become unnecessary for those domains too. At first I fully bought into this idea, and saw any remaining manually-engineered feature pipelines as legacy code that would soon be subsumed by more advanced models.

Over the last few years of working with product teams to deploy models in production I’ve realized I was wrong. I’m not the first person to raise this idea, but I have some thoughts I haven’t seen widely discussed on exactly why feature engineering isn’t going away anytime soon. One of them is that even the original vision case actually does rely on a *lot* of feature engineering, we just haven’t been paying attention. 

Read the whole thing.

Comments closed