Press "Enter" to skip to content

Category: Machine Learning

Feathr in the Linux Foundation

Hangfei Lin and Jinghui Mo make an announcement:

We’re excited to announce today that Feathr is joining LF AI & Data, the Linux Foundation’s umbrella foundation supporting open source innovation in artificial intelligence (AI) and data. Feathr is a feature store that simplifies machine learning (ML) feature serving and improves developer productivity.

“We’re excited to welcome Feathr to LF AI & Data and for it to be part of our technical project portfolio (41 projects and growing) with a community of over 17K developers,” said Dr. Ibrahim Haddad, Executive Director of LF AI & Data. “We aim to support Feathr to expand its user base, grow its community of developers, become a leader within its own category, and enable collaboration and integration opportunities with other projects. We look forward to the project’s continued growth and success as part of LF AI & Data.”

Alex Woodie has more background:

Feathr was originally developed at LinkedIn to help manage and serve features used in its machine learning applications. Instead of manually working with features as part of an individual data pipeline, Feathr automates and standardizes the interaction with the data type, which is used in both the training and inference stages of machine learning.

Comments closed

Choosing between Neural Network Types

Jason Brownlee takes us through three common classes of neural network and explains when each is useful:

In this post, you will discover the suggested use for the three main classes of artificial neural networks.

After reading this post, you will know:

– Which types of neural networks to focus on when working on a predictive modeling problem.

– When to use, not use, and possible try using an MLP, CNN, and RNN on a project.

– To consider the use of hybrid models and to have a clear idea of your project goals before selecting a model.

Read the whole thing.

Comments closed

Interpreting Kernel SHAP

Michael Mayer digs into Kernel SHAP:

In their 2017 paper on SHAP, Scott Lundberg and Su-In Lee presented Kernel SHAP, an algorithm to calculate SHAP values for any model with numeric predictions. Compared to Monte-Carlo sampling (e.g. implemented in R package “fastshap”), Kernel SHAP is much more efficient.

I had one problem with Kernel SHAP: I never really understood how it works!

Needless to say, Michael knows Kernel SHAP a lot better now, considering there’s now a kernelshap package for us.

Comments closed

An Intro to Key Word Analysis

Lewis Prince continues a series on natural language processing:

Here we are with part 2 of this blog series on web scraping and natural language processing (NLP). In the first part I discussed what web scraping was, why it’s done and how it can be done. In this part I will give you details on what NLP is at a high level, and then go into detail of an application of NLP called key word analysis (KWA).

Read on for a high-level overview of the topic and how to do it in Cognitive Services. But not the topic model—that’d be a different post.

Comments closed

What’s New in SynapseML

Nellie Gustafsson and Mark Hamilton share an update:

SynapseML is a massively scalable (feel free to spin up hundreds of machines!) machine learning library built on Apache Spark. SynapseML makes it easy to train production-ready models to solve problems from simple classification and regression to anomaly detection, translation, image analysis, speech to text, and just about any ML challenge you are facing.  Under the hood, SynapseML integrates a wide array of ML technologies such as LightGBM, Vowpal Wabbit, ONNX, and the Cognitive Services into a single easy to use API compatible with MLFlow. We know, we know, everyone hates when developers invent new APIs, but you can rest easy because SynapseML integrates cleanly into existing Spark ML APIs so you can embed models directly into existing pipelines. We strive to make SynapseML available to developers wherever they work, and the library is available in a variety of languages like Python, Scala, Java, R. As of this release SynapseML is also usable from .NET, C#, F#.

Saving the best language for last, I see. Click through for the list of updates.

Comments closed

Text Clustering with Python

Luke Menzies takes us through the gensim library:

An interesting branch of machine learning is Natural Language Processing (NLP). As the name suggests, it involves training machines to detect patterns in language using algorithms. It is quite often the case that NLP is referred to as text analytics. It is actually more impressive than that. It examines vectorised patterns which not only looks at the positioning of elements but what it means in context to neighbouring elements within the vector. In a nutshell, this technique can be extended beyond text to patterns of linguistics in general and even contextual patterns. Nevertheless, its primary use in the machine learning world is to analyse text.

This article will focus on an interesting application of NLP which involves the clustering of text. Clustering is a popular unsupervised machine learning technique used for segmentation or grouping of data. It is a very powerful tool that is used across a variety of industries. However, it is rare you hear of applying clustering to text. This can be achieved using NLP functions, combined with clustering algorithms that can handle non-Euclidian distances.

Read on for an overview of the process and an example of combining DBSCAN with word2vec to cluster phrases.

Comments closed

The Seedy Underbelly of Machine Learning Fitting

John Mount is not impressed with a fair amount of machine learning:

For this to actually happen we need the actual system to be in our concept space, a lot of training data, and an abundance of caution.

In practice what we see more and more is the training procedure in fact attacks the evaluation procedure. It doesn’t just improve the quality of the fit artifact, but through mere optimization accidentally exploits weaknesses in the measurement system itself. When this happens, fitting does the following.

In ML training, we often accidentally “teach to the test” by comparing models via test data, which over time selects for models which are better fits for the test data. As John notes, this can come two separate ways and if you don’t define your optimization strategy correctly, you can accidentally train models which optimize on non-realistic things. A classic example is the neural network which could pick out malignant tumors from non-malignant tumors not because of any property of the tumor itself but rather because the malignant tumor images all had rulers in them and the non-malignant images did not. Read the whole thing for a second pitfall you can hit when training models.

Comments closed

PHI De-Identification in Databricks with NLP

Amir Kermany, et al, share a set of notebooks:

John Snow Labs, the leader in Healthcare natural language processing (NLP), and Databricks are working together to help organizations process and analyze their text data at scale with a series of Solution Accelerator notebook templates for common NLP use cases. You can learn more about our partnership in our previous blog, Applying Natural Language Processing to Health Text at Scale.

To help organizations automate the removal of sensitive patient information, we built a joint Solution Accelerator for PHI removal that builds on top of the Databricks Lakehouse for Healthcare and Life Sciences. John Snow Labs provides two commercial extensions on top of the open-source Spark NLP library — both of which are useful for de-identification and anonymization tasks — that are used in this Accelerator:

This is a really interesting scenario.

Comments closed

Saving and Loading a Keras Model

Jason Brownlee made it to a savepoint in time:

Given that deep learning models can take hours, days and even weeks to train, it is important to know how to save and load them from disk.

In this post, you will discover how you can save your Keras models to file and load them up again to make predictions.

After reading this tutorial you will know:

– How to save model weights and model architecture in separate files.

– How to save model architecture in both YAML and JSON format.

– How to save model weights and architecture into a single file for later use.

Read on for an updated step-by-step tutorial.

Comments closed