Press "Enter" to skip to content

Category: Data Science

The Hype Cycle for Artificial Intelligence

William Vorhies takes a look at Gartner’s hype cycle for AI (among other things):

Supposing you’re a business leader and supposing you’re trying to make an intelligent decision about prioritizing your AI adoption plans.  It’s likely that like many of us the first thing you’d reach for would be one of Gartner’s many hype cycle or magic quadrant analyses.

What you might not know is that you now need an expert just to guide you through the expert literature.  There has been such a proliferation of hype cycles and magic quadrants that you could easily be looking in the wrong place.

The hype cycle is definitely opinion-based, but I think it’s a useful look at the relative maturity of different segments of an industry or technology cluster. Do read the whole thing, though, as these things aren’t perfect.

Comments closed

Converting Odds to Probabilities with R

Jonas Christoffer Lindstrom has a new package:

Now you might think that converting decimal odds to probabilities should be easy, you can just use the definition above and take the inverse of the odds to recover the probability. But it is not that simple, since in practice using this simple formula will give you improper probabilities. They will not sum to 1, as they should, but be slightly larger. This gives the bookmakers an edge and the probabilities (which aren’t real probabilities) can not be considered fair, and so different methods for correcting this exists.

Read on to learn more about the problem and a few solutions. H/T R-Bloggers.

Comments closed

Multi-Armed Bandits

Alex Slivkins has a new book:

If you’ve ever been in a casino, you may have found yourself asking one very pertinent question: On which slot machine am I going to hit the jackpot? Standing in front of a bank of identical-looking machines, you have only instinct to go on. It isn’t until you start putting your money into these one-armed bandits, as they’re also known, that you get a sense of which are hot and which are not, and when you find one that’s paying out regularly, you might stick with it in hopes of winning big. Though seemingly specific to the Las Vegas Strip, this scenario boils down to an exploration-exploitation tradeoff: make a decision based on what you already know and miss out on a potentially bigger reward or spend time and resources continuing to gather information.

Read on for some info about the book. Near the end, Alex gives a link to where you can buy it, as well as where you can get a PDF copy for free.

Comments closed

Security Changes in ML Services

Dennes Torres goes over some of the security changes with Machine Learning Services in SQL Server 2019:

I have a confession to make. Why, in my last article about shortest_path in SQL Server 2019, have I used Gephi in order to illustrate the relationships, instead of using a script in for the same purpose and demonstrate Machine Learning Services as well?

The initial plan was to use an R script; however, the R script which works perfectly in SQL Server 2017 doesn’t work in SQL Server 2019.

The change is a positive one from the standpoint of security, but it also makes life more difficult. I found this particularly tricky when installing TensorFlow and Keras in R via ML Services.

Comments closed

Fun with Regressions and the Zero Line

I have a post covering some important things to keep in mind when reviewing a regression:

The Line is NOT the Data

One of the worst things we can do as data analysts is to interpret a regression line as the most important thing on a visual. The important thing here is the per-state set of data points, but our eyes are drawn to the line. The line mentally replaces the data, but in doing so, we lose the noise. And boy, is there a lot of noise.

This was my first point, but I think it’s the most important one to keep in mind: just because we draw a line and there’s a best fit doesn’t mean that fit is actually any good. And if the fit isn’t any good, the line is…optimistic with regard to how informative it is.

Comments closed

An Overview of Generative Adversarial Networks

Mohammad Waseem takes us through an overview of Generative Adversarial Networks:

Generative models are nothing but those models that use an Unsupervised Learning approach. In a generative model, there are samples in the data i.e input variables X, but it lacks the output variable Y. We use only the input variables to train the generative model and it recognizes patterns from the input variables to generate an output that is unknown and based on the training data only.

In Supervised Learning, we are more aligned towards creating predictive models from the input variables, this type of modeling is known as discriminative modeling. In a classification problem, the model has to discriminate as to which class the example belongs to. On the other hand, unsupervised models are used to create or generate new examples in the input distribution.

To define generative models in layman’s terms we can say, generative models, are able to generate new examples from the sample that are not only similar to other examples but are indistinguishable as well.

Click through for the overview.

Comments closed

Monitoring for Distribution Changes

Nina Zumel explains how we can track if something has changed by monitoring its distribution:

A client recently came to us with a question: what’s a good way to monitor data or model output for changes? That is, how can you tell if new data is distributed differently from previous data, or if the distribution of scores returned by a model have changed? This client, like many others who have faced the same problem, simply checked whether the mean and standard deviation of the data had changed more than some amount, where the threshold value they checked against was selected in a more or less ad-hoc manner. But they were curious whether there was some other, perhaps more principled way, to check for a change in distribution.

The answer is, of course, that there is. Click through to see a few of the techniques.

Comments closed

Benford’s Law in Power BI

Imke Feldmann shows how you can build up a Benford distribution in DAX:

The green columns show how often each number should be the first digit in numbers that should follow the Benford-distribution. In black you’ll see the actual distribution of first digits within my table. Lastly, the red line shows the percentual absolute deviations between actual and Benford values.

In this example, there is a relatively high occurrence of numbers starting with 4 and 5. So this could be a sign for fraudulent manipulations.

In the example, eyeballing it says things look pretty good. It’s interesting to see just how many things fit a Benford distribution, including populations, budgets (when you have enough line items), expenses, etc. Not everything does, however—high and low temperatures tend not to, either in Fahrenheit or Celsius.

Comments closed