Press "Enter" to skip to content

Category: Machine Learning

Text Mining and Sentiment Analysis in R

Sanil Mhatre walks us through a sentiment analysis scenario in R:

Sentiments can be classified as positive, neutral or negative. They can also be represented on a numeric scale, to better express the degree of positive or negative strength of the sentiment contained in a body of text.

This example uses the Syuzhet package for generating sentiment scores, which has four sentiment dictionaries and offers a method for accessing the sentiment extraction tool developed in the NLP group at Stanford. The get_sentiment function accepts two arguments: a character vector (of sentences or words) and a method. The selected method determines which of the four available sentiment extraction methods will be used. The four methods are syuzhet (this is the default), bingafinn and nrc. Each method uses a different scale and hence returns slightly different results. Please note the outcome of nrc method is more than just a numeric score, requires additional interpretations and is out of scope for this article. The descriptions of the get_sentiment function has been sourced from : https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html?

Comments closed

Using Cognitive Services in Power BI without a Premium Subscription

Marc Lelijveld and Kathrin Borchert show how we can take advantage of Cognitive Services and Power BI without having to pay for Power BI Premium:

Recently, I was presenting my session about AI Capabilities for Power BI to make AI Accessible for Everyone for the Virtual Power BI Days Hamburg. A great event organized by Kathrin Borchert. Part of my session was about the Artificial Intelligence capabilities offered as part of Power BI Premium. A day later, Kathrin came up with a great idea how you can leverage these AI capabilities without the need for Power BI Premium.

I was directly enthusiastic about that idea since I thought about this in the past as well. Back then, there were some blockers which are sorted now. I asked Kathrin if she was open for co-authoring this blog and she immediately agreed.

Click through for the technique. Basically, it’s a trade-off between simplicity and cost.

Comments closed

Java Extension for SQL Server Now Open Source

Nellie Gustafsson announces some exciting news:

Today, we’re thrilled to announce that we are open sourcing the Java language extension for SQL Server on GitHub.

This extension is the first example of using an evolved programming language extensibility architecture which allows integration with a new type of language extensions. This new architecture gives customers the freedom to bring their own runtime and execute programs using that runtime in SQL Server while leveraging the existing security and governance that the SQL Server programming language extensibility architecture provides.

This opens up the possibility for additional languages. .NET languages (C# and F#) would be a natural fit and languages like Go might have enough dedicated support to give this a try.

Comments closed

Distributed Model Training with Dask and SciKit-Learn

Matthieu Lamairesse shows us how we can use Dask to perform distributed ML model training:

Dask is an open-source parallel computing framework written natively in Python (initially released 2014). It has a significant following and support largely due to its good integration with the popular Python ML ecosystem triumvirate that is NumPy, Pandas and Scikit-learn. 

Why Dask over other distributed machine learning frameworks? 

In the context of this article it’s about Dask’s tight integration with Sckit-learn’s JobLib parallel computing library that allows us to distribute Scikit-learn code with (almost) no code change, making it a very interesting framework to accelerate ML training. 

Click through for an interesting article and an example of using this on Cloudera’s ML platform.

Comments closed

Using Pre-Trained Sentiment Models with Power BI

Ryan Wade shows us how to use a pre-built sentiment analysis model with Power BI:

As of this writing, there are two pre-trained models available: one for sentiment analysis and another for image classification. This example focuses on sentiment analysis.

Both of these installations are freely available to the on-prem version of SQL Server 2017 and later. For more information on how to install these on your instance, reference this article for SQL Server Machine Learning Services and this article for pre-trained models.

Click through for step-by-step instructions.

Comments closed

The Hype Cycle for Artificial Intelligence

William Vorhies takes a look at Gartner’s hype cycle for AI (among other things):

Supposing you’re a business leader and supposing you’re trying to make an intelligent decision about prioritizing your AI adoption plans.  It’s likely that like many of us the first thing you’d reach for would be one of Gartner’s many hype cycle or magic quadrant analyses.

What you might not know is that you now need an expert just to guide you through the expert literature.  There has been such a proliferation of hype cycles and magic quadrants that you could easily be looking in the wrong place.

The hype cycle is definitely opinion-based, but I think it’s a useful look at the relative maturity of different segments of an industry or technology cluster. Do read the whole thing, though, as these things aren’t perfect.

Comments closed

Security Changes in ML Services

Dennes Torres goes over some of the security changes with Machine Learning Services in SQL Server 2019:

I have a confession to make. Why, in my last article about shortest_path in SQL Server 2019, have I used Gephi in order to illustrate the relationships, instead of using a script in for the same purpose and demonstrate Machine Learning Services as well?

The initial plan was to use an R script; however, the R script which works perfectly in SQL Server 2017 doesn’t work in SQL Server 2019.

The change is a positive one from the standpoint of security, but it also makes life more difficult. I found this particularly tricky when installing TensorFlow and Keras in R via ML Services.

Comments closed

Explaining Black Box Models with LIME

Holger von Jouanne-Diedrich takes us through the intuition of LIME:

There is a new hot area of research to make black-box models interpretable, called Explainable Artificial Intelligence (XAI), if you want to gain some intuition on one such approach (called LIME), read on!

Before we dive right into it it is important to point out when and why you would need interpretability of an AI. While it might be a desirable goal in itself it is not necessary in many fields, at least not for users of an AI, e.g. with text translation, character and speech recognition it is not that important why they do what they do but simply that they work.

In other areas, like medical applications (determining whether tissue is malignant), financial applications (granting a loan to a customer) or applications in the criminal-justice system (gauging the risk of recidivism) it is of the utmost importance (and sometimes even required by law) to know why the machine arrived at its conclusions.

One approach to make AI models explainable is called LIME for Local Interpretable Model-Agnostic Explanations. There is already a lot in this name!

LIME is not trivial to use and it can be very slow, but it is a great way to visualize models.

Comments closed

Concepts in Support Vector Machines

Abhijit Telang takes us through the calculations involved in Support Vector Machines and then gives us an example in R:

So, let’s take that out and we are back to old, classical vector algebra. It’s like a person with a bunch of sticks to figure out which one to lay where in a 2-D plane to separate one class of objects from another, provided class definitions are already known. 

The problem is which particular shape and length must be chosen to show maximum contrast between classes.

We need to arrive at a function definition, in such a way that the value a given function takes changes drastically (e.g. from a large positive value to a large negative value).

SVM is often great for two-class classification problems, and different variants also work well for multi-class problems.

Comments closed