The first thing we need to do is create an inference pipeline. Inference pipelines differ from training pipelines in that they won’t use the training dataset, but they will accept user input and provide a scored response. There are two types of inference pipeline: real-time and batch. Real-time inference pipelines are intended for small-set work. We’ll host a service on some compute resource in Azure and people will make REST API calls to our service, sending in a request with a few items to score and we send back classification results.
By contrast, a batch pipeline is what you’d use if you have a nightly job with tens of millions of items to score. In that case, the typical pattern is to have a service listening for changes in a storage account and, some time after people drop new files into the proper folder, the batch inference process will pick up these files, score the results, and write those results out to a destination location.
This post is all about inference pipelines. The next post will be all about batch pipelines.