Press "Enter" to skip to content

Category: Machine Learning

Fraud Detection with Flink

Alexander Fedulov gives us a case study of using Apache Flink for fraud detection:

In this blog post, we have discussed the motivation behind supporting dynamic, runtime changes to a Flink application by looking at a sample use case – a Fraud Detection engine. We have described the overall architecture and interactions between its components as well as provided references for building and running a demo Fraud Detection application in a dockerized setup. We then showed the details of implementing a dynamic data partitioning pattern as the first underlying building block to enable flexible runtime configurations.

To remain focused on describing the core mechanics of the pattern, we kept the complexity of the DSL and the underlying rules engine to a minimum. Going forward, it is easy to imagine adding extensions such as allowing more sophisticated rule definitions, including filtering of certain events, logical rules chaining, and other more advanced functionality.

It was an interesting discussion and you can grab the code as well.

Comments closed

Combining SAS + R for ML

Sophia Rowland has a demo of working with SAS Cloud Analytic Service from R:

The Scripting Wrapper for Analytics Transfer, also known as SWAT, is a package that allows R users to access the power of the SAS Cloud Analytic Service (CAS) from a familiar R interface. The SWAT package is available to SAS Visual Analytics (VA), SAS Visual Statistics (VS), and SAS Visual Data Mining and Machine Learning (VDMML) users. To begin working with SWAT, download and install the package from the SAS Software GitHub page.

Read on for the demo.

Comments closed

Time Series Anomaly Detection with Power BI

Leila Etaati takes us through time series anomaly detection with Cognitive Services and Power Query:

I am excited about this blog post, this is based on the New service in Cognitive Service name “Anomaly Detection” which is now in Preview.
I recorded a video about how it works in cognitive service https://youtu.be/7ZOtZDbn6gM. 

However, I am going to talk about how to use it in Power BI. In this post first, a brief introduction to the anomaly detection will be presented, then how it can be used inside Power BI will be discussed.

It sounds like there are still some rough edges, but they already have the makings of an interesting service.

Comments closed

The A* Search Algorithm

Akash Kumar takes us through the A* algorithm and how it works for search:

Path Finding has been one of the oldest and most popular applications in computer programming. You could virtually find the most optimal path from a source to a destination by adding costs which would represent time, money etc. A* is one of the most popular algorithms for all the right reasons. In this article, let’s find out just why.

Click through for an explanation of what the algorithm does and pseudocode to implement it.

Comments closed

Automated ML Pipelines with SAS

Sophia Rowland shows off SAS’s auto-ML action:

The dsAutoMl action does it all. It will explore your data, generate features, select features, create models, and autotune the hyper-parameters of those models. This action includes the four policies we have seen in my first two blogs: explorationPolicy, screenPolicy, transformationPolicy, and selectionPolicy. Please review my previous blogs if you need a refresher on the data exploration and cleaning process or feature generation and selection process. The dsAutoMl action builds on our prior discussions through model generation and autotuning. A data scientist can choose to build several models such as decision trees, random forests, gradient boosting models, and neural networks. In addition, the data scientist can control which objective function to optimize for and the number of K-folds to use. The output of the dsAutoMl action includes information about the features generated, information on the model pipelines generated, and an analytic store file for generating the features with new data.

This is an area where several companies are investing a lot of money, trying to simplify the process of training models.

Comments closed

Bot Framework 101 Notes

Annie Xu has some notes from an introductory course on the Microsoft Bot framework:

Not long ago, I got a chance to learn a Bot 101 lesson from my teammate Wayne Smith. It was a great class because it helped me who is an new learner to understand a lot of key concepts of Microsoft bot. Because it is in an internal meeting and there is no public video released, I wrote some notes below to share with you.

Click through for Annie’s notes and a bunch of links to additional resources.

Comments closed

Preventing Overfitting in ML Models

Tom Jordan gives us four techniques to reduce the likelihood of overfitting in our models:

Dropout
This technique is exclusively used within the training of neural networks, so isn’t applicable to all machine learning models, however can be used in the production of extremely effective neural network models. During the start of each step in the training process, each sub unit of the model, the neuron, has a probability of being included in that step or not. If it doesn’t make the cut, it is effectively deleted from the network for that step, and then reintroduced on the next step.

There are some good techniques here.

Comments closed

Dealing with NULLs in Java with SQL Server 2019

Niels Berglund covers changes in SQL Server Machine Learning Services around Java code execution:

In the null values post mentioned above, I mentioned that there are differences between SQL Server and Java in how they handle null. So, when we call into Java from SQL Server, we may want to treat null values the same way as we do in SQL Server.

I wrote about this in the SQL Server 2019 Extensibility Framework & Java – Null Values post mentioned above. However, that post was written before SQL Server 2019 CTP 2.5. In CTP 2.5 Microsoft introduced the Java SDK, and certain things changed. Amongst the things that changed is the way we handle nulls when we receive datasets from SQL Server in our Java code.

Read on to learn how it works today.

Comments closed

MLFlow on Databricks Community Edition

Jules Damji and Siddharth Murching have an interesting announcement:

Today, we are excited to extend Databricks Community Edition with hosted MLflow for free, as part of our ongoing commitment to help developers learn about machine learning lifecycle. With the Community Edition, you can try tutorials that demonstrate how to track results and experiments as you build machine learning models—a crucial stage in the machine learning model’s development lifecycle.

MLflow is an open-source platform for the machine learning lifecycle with four components: MLflow TrackingMLflow ProjectsMLflow Models, and MLflow Registry. MLflow is now included in Databricks Community Edition, meaning that you can utilize its Tracking and Model APIs within a notebook or from your laptop just as easily as you would with managed MLflow in Databricks Enterprise Edition.

I like showing off Databricks Community Edition, and I’m glad to see them extend it a bit.

Comments closed

Topic Modeling

Federico Pascual has an article on topic modeling and topic classification:

Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents. It’s known as ‘unsupervised’ machine learning because it doesn’t require a predefined list of tags or training data that’s been previously classified by humans.

Since topic modeling doesn’t require training, it’s a quick and easy way to start analyzing your data. However, you can’t guarantee you’ll receive accurate results, which is why many businesses opt to invest time training a topic classification model.

The article is long but worth the read, with examples in Python and additional notes for R.

Comments closed