Press "Enter" to skip to content

Category: Machine Learning

Understanding Support Vector Machines

Luis Valencia takes us through the algorithm for support vector machines:

A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problem. Compared to newer algorithms like neural networks, they have two main advantages: higher speed and better performance with a limited number of samples (in the thousands).

Pepperidge Farms remembers when we used genetic algorithms to solve problems because support vector machines were too slow.

Comments closed

Detecting Hard-to-Classify Data

Kaushal Mukherjee takes us through a new Python package:

The article explains the algorithm behind the recently introduced Python package named PyHard, based on the concept of Instance Space Analysis. It helps in assessing the quality of a dataset and identifying what are the instances which are hard/easy to classify. With the help of this algorithm we can separate out noisy instances. It also provides an interactive visualization tool to deep dive into the instance space.

Click through for the details. I’m going to wait for PyHard 2: PyHarder. Or maybe PyHardWithAVengeance. But it’ll all go downhill by the time we get to PyHard 5.

Comments closed

TensorFlow Fundamentals

Tanishka Garg starts a series on TensorFlow:

TensorFlow is an open-source end-to-end machine learning library. It is for preprocessing data, modeling data, and serving models (getting them into the hands of others).

It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML. And developers easily build and deploy ML-powered applications.

Read on for basic setup instructions and a primer on tensors.

Comments closed

Defect Detection with AWS Lookout and Sagemaker

Matthew Rhodes, et al, take us through an interesting case study:

According to a recent study, defective products cost industries over $2 billion from 2012–2017. Defect detection within manufacturing is an important business use case, especially in high-value product industries like the automotive industry. This allows for early diagnosis of anomalies to improve production line efficacy and product quality, and saves capital costs. Although advanced anomaly detection systems employ sensors as well as Internet of Things (IoT) devices to collect multimodal data to improve performance, computer vision continues to be a common approach. Detecting anomalies in automotive parts and components using computer vision can be done using normal images, and even X-Ray based images for structural damages. Recent advances in deep learning and computer vision have allowed scientists and manufacturers to develop enhanced anomaly detection systems, including surface defect detection on automotive body panels and dent detection in vehicles.

Read on for case notes.

Comments closed

Ensemble Classification in Azure Machine Learning

Dinesh Asanka reminds me not to use the designer for tough Azure ML problems:

Let us see how we can extend the standard classification to Ensemble Classifiers in Azure Machine Learning. Before we discuss the details of this configuration, you can view or download the experiment from Ensemble Classification

The following figure shows the complex layout of the Ensemble Classifiers in Azure Machine Learning.

Dinesh is not kidding about that complexity. This is definitely a use case for the Azure ML SDK.

Comments closed

Environments in Azure ML

Luis Valencia explains what environments are in Azure ML:

An Environment defines Python packages, environment variables, and Docker settings that are used in machine learning experiments, including in data preparation, training, and deployment to a web service. An Environment is managed and versioned in an Azure Machine Learning Workspace. You can update an existing environment and retrieve a version to reuse. Environments are exclusive to the workspace they are created in and can’t be used across different workspaces.

In basic terms for a developer, it’s basically a Docker Image with all the needed dependencies (conda/pip packages) to run your experiment.

A friendly word of advice from some bad experiences: stick with the curated environments as much as you can. Those are easy and rarely fail. Building your own environments from Conda files is a possibility, but it’s an, err, probabilistic exercise as to whether your compute target will actually work or not.

Comments closed

Image Classification with ML.NET

Ivan Matec shows how to use ML.NET’s image classification with an example of vital importance:

One of the best scenes from Silicon Valley is Jian Yang demoing his “Hotdog, not hotdog” application. In this article, we will build our own “Hotdog, not hotdog” solution using ML.NET. After all, who would not want to determine if that dish is, or is not a hot dog? Just take a picture, upload it to the web or desktop application, and get results with almost 90% certainty in a second.

Although some may say this is not a very useful application, it is a fun way to explore another machine learning concept through ML.NET. I covered installing and getting started with ML.NET in Visual Studio in my previous article, so refer to it if you missed it.

Click through for the implementation, which is quite straightforward.

Comments closed

Databricks Autologging

Corey Zumar and Kasey Uhlenhuth announce a new product:

Machine learning teams require the ability to reproduce and explain their results–whether for regulatory, debugging or other purposes. This means every production model must have a record of its lineage and performance characteristics. While some ML practitioners diligently version their source code, hyperparameters and performance metrics, others find it cumbersome or distracting from their rapid prototyping. As a result, data teams encounter three primary challenges when recording this information: (1) standardizing machine learning artifacts tracked across ML teams, (2) ensuring reproducibility and auditability across a diverse set of ML problems and (3) maintaining readable code across many logging calls.

Read on to see how Databricks Autologging can satisfy these issues.

Comments closed

From API Call to ML Services Prediction

Tomaz Kastrun continues a series:

From the previous two blog posts:

Creating REST API for reading data from Microsoft SQL Server in web browser

Writing Data to Microsoft SQL Server from web browser using REST API and node.js

We have looked into the installation process of Node.js, setup of Microsoft SQL Server and made couple of examples on reading the data from database through REST API and how to insert data back to database.

In this post, we will be looking the R predictions using API calls against a sample dataset.

Click through to see it in action.

Comments closed

Shrinking Convolutional Neural Networks for TinyML

Pete Warden writes up a tip:

A colleague recently asked for more details on an approach I recommended, but which she hadn’t seen any documentation for. I realized that it was something I’d learned from talking to model builders at Google, and I wasn’t sure there was anything written up, so in the spirit of leaving a trail of breadcrumbs for anyone coming after, I thought I should put it into a quick blog post.

The summary is that if you have MaxPool or AveragePool after a convolutional layer in a network, and you’re targeting a resource-constrained system like a microcontroller, you should try removing them entirely and replacing them with a stride in the convolution instead. This has two main benefits, but to explain it’s easiest to diagram out the network before and after.

Click through for the full explanation.

Comments closed