Press "Enter" to skip to content

Category: Machine Learning

GPU-Accelerated Analysis on Databricks using PyTorch + Huggingface

Srijith Rajamohan walks us through an example of sentiment analysis using the PyTorch and Huggingface libraries on Databricks:

Sentiment analysis is commonly used to analyze the sentiment present within a body of text, which could range from a review, an email or a tweet. Deep learning-based techniques are one of the most popular ways to perform such an analysis. However, these techniques tend to be very computationally intensive and often require the use of GPUs, depending on the architecture and the embeddings used. Huggingface (https://huggingface.co) has put together a framework with the transformers package that makes accessing these embeddings seamless and reproducible. In this work, I illustrate how to perform scalable sentiment analysis by using the Huggingface package within PyTorch and leveraging the ML runtimes and infrastructure on Databricks.

Click through for a description of the process, as well as a link to a notebook you can walk through yourself.

Comments closed

Understanding Support Vector Machines

Luis Valencia takes us through the algorithm for support vector machines:

A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problem. Compared to newer algorithms like neural networks, they have two main advantages: higher speed and better performance with a limited number of samples (in the thousands).

Pepperidge Farms remembers when we used genetic algorithms to solve problems because support vector machines were too slow.

Comments closed

Detecting Hard-to-Classify Data

Kaushal Mukherjee takes us through a new Python package:

The article explains the algorithm behind the recently introduced Python package named PyHard, based on the concept of Instance Space Analysis. It helps in assessing the quality of a dataset and identifying what are the instances which are hard/easy to classify. With the help of this algorithm we can separate out noisy instances. It also provides an interactive visualization tool to deep dive into the instance space.

Click through for the details. I’m going to wait for PyHard 2: PyHarder. Or maybe PyHardWithAVengeance. But it’ll all go downhill by the time we get to PyHard 5.

Comments closed

TensorFlow Fundamentals

Tanishka Garg starts a series on TensorFlow:

TensorFlow is an open-source end-to-end machine learning library. It is for preprocessing data, modeling data, and serving models (getting them into the hands of others).

It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML. And developers easily build and deploy ML-powered applications.

Read on for basic setup instructions and a primer on tensors.

Comments closed

Defect Detection with AWS Lookout and Sagemaker

Matthew Rhodes, et al, take us through an interesting case study:

According to a recent study, defective products cost industries over $2 billion from 2012–2017. Defect detection within manufacturing is an important business use case, especially in high-value product industries like the automotive industry. This allows for early diagnosis of anomalies to improve production line efficacy and product quality, and saves capital costs. Although advanced anomaly detection systems employ sensors as well as Internet of Things (IoT) devices to collect multimodal data to improve performance, computer vision continues to be a common approach. Detecting anomalies in automotive parts and components using computer vision can be done using normal images, and even X-Ray based images for structural damages. Recent advances in deep learning and computer vision have allowed scientists and manufacturers to develop enhanced anomaly detection systems, including surface defect detection on automotive body panels and dent detection in vehicles.

Read on for case notes.

Comments closed

Ensemble Classification in Azure Machine Learning

Dinesh Asanka reminds me not to use the designer for tough Azure ML problems:

Let us see how we can extend the standard classification to Ensemble Classifiers in Azure Machine Learning. Before we discuss the details of this configuration, you can view or download the experiment from Ensemble Classification

The following figure shows the complex layout of the Ensemble Classifiers in Azure Machine Learning.

Dinesh is not kidding about that complexity. This is definitely a use case for the Azure ML SDK.

Comments closed

Environments in Azure ML

Luis Valencia explains what environments are in Azure ML:

An Environment defines Python packages, environment variables, and Docker settings that are used in machine learning experiments, including in data preparation, training, and deployment to a web service. An Environment is managed and versioned in an Azure Machine Learning Workspace. You can update an existing environment and retrieve a version to reuse. Environments are exclusive to the workspace they are created in and can’t be used across different workspaces.

In basic terms for a developer, it’s basically a Docker Image with all the needed dependencies (conda/pip packages) to run your experiment.

A friendly word of advice from some bad experiences: stick with the curated environments as much as you can. Those are easy and rarely fail. Building your own environments from Conda files is a possibility, but it’s an, err, probabilistic exercise as to whether your compute target will actually work or not.

Comments closed

Image Classification with ML.NET

Ivan Matec shows how to use ML.NET’s image classification with an example of vital importance:

One of the best scenes from Silicon Valley is Jian Yang demoing his “Hotdog, not hotdog” application. In this article, we will build our own “Hotdog, not hotdog” solution using ML.NET. After all, who would not want to determine if that dish is, or is not a hot dog? Just take a picture, upload it to the web or desktop application, and get results with almost 90% certainty in a second.

Although some may say this is not a very useful application, it is a fun way to explore another machine learning concept through ML.NET. I covered installing and getting started with ML.NET in Visual Studio in my previous article, so refer to it if you missed it.

Click through for the implementation, which is quite straightforward.

Comments closed

Databricks Autologging

Corey Zumar and Kasey Uhlenhuth announce a new product:

Machine learning teams require the ability to reproduce and explain their results–whether for regulatory, debugging or other purposes. This means every production model must have a record of its lineage and performance characteristics. While some ML practitioners diligently version their source code, hyperparameters and performance metrics, others find it cumbersome or distracting from their rapid prototyping. As a result, data teams encounter three primary challenges when recording this information: (1) standardizing machine learning artifacts tracked across ML teams, (2) ensuring reproducibility and auditability across a diverse set of ML problems and (3) maintaining readable code across many logging calls.

Read on to see how Databricks Autologging can satisfy these issues.

Comments closed

From API Call to ML Services Prediction

Tomaz Kastrun continues a series:

From the previous two blog posts:

Creating REST API for reading data from Microsoft SQL Server in web browser

Writing Data to Microsoft SQL Server from web browser using REST API and node.js

We have looked into the installation process of Node.js, setup of Microsoft SQL Server and made couple of examples on reading the data from database through REST API and how to insert data back to database.

In this post, we will be looking the R predictions using API calls against a sample dataset.

Click through to see it in action.

Comments closed