Data Science – Page 27

Multi-Armed Bandits

Published 2020-03-02 by Kevin Feasel

If you’ve ever been in a casino, you may have found yourself asking one very pertinent question: On which slot machine am I going to hit the jackpot? Standing in front of a bank of identical-looking machines, you have only instinct to go on. It isn’t until you start putting your money into these one-armed bandits, as they’re also known, that you get a sense of which are hot and which are not, and when you find one that’s paying out regularly, you might stick with it in hopes of winning big. Though seemingly specific to the Las Vegas Strip, this scenario boils down to an exploration-exploitation tradeoff: make a decision based on what you already know and miss out on a potentially bigger reward or spend time and resources continuing to gather information.

Read on for some info about the book. Near the end, Alex gives a link to where you can buy it, as well as where you can get a PDF copy for free.

Comments closed

Security Changes in ML Services

Published 2020-02-24 by Kevin Feasel

Dennes Torres goes over some of the security changes with Machine Learning Services in SQL Server 2019:

I have a confession to make. Why, in my last article about shortest_path in SQL Server 2019, have I used Gephi in order to illustrate the relationships, instead of using a script in R for the same purpose and demonstrate Machine Learning Services as well?
The initial plan was to use an R script; however, the R script which works perfectly in SQL Server 2017 doesn’t work in SQL Server 2019.

The change is a positive one from the standpoint of security, but it also makes life more difficult. I found this particularly tricky when installing TensorFlow and Keras in R via ML Services.

Comments closed

Gartner’s Magic Quadrant for Data Science + ML

Published 2020-02-19 by Kevin Feasel

Adam Conway, et al, review Gartner’s 2020 Data Science and Machine Learning Platforms Magic Quadrant:

Gartner has released its 2020 Data Science and Machine Learning Platforms
Magic Quadrant, and we are excited to announce that Databricks has been recognized as a Leader.

Click through for the quadrant and explanation. It’s an interesting set of results, that’s for sure.

Comments closed

Fun with Regressions and the Zero Line

Published 2020-02-19 by Kevin Feasel

I have a post covering some important things to keep in mind when reviewing a regression:

The Line is NOT the Data
One of the worst things we can do as data analysts is to interpret a regression line as the most important thing on a visual. The important thing here is the per-state set of data points, but our eyes are drawn to the line. The line mentally replaces the data, but in doing so, we lose the noise. And boy, is there a lot of noise.

This was my first point, but I think it’s the most important one to keep in mind: just because we draw a line and there’s a best fit doesn’t mean that fit is actually any good. And if the fit isn’t any good, the line is…optimistic with regard to how informative it is.

Comments closed

Computing a Z Score with R

Published 2020-02-18 by Kevin Feasel

Anisa Dhana shows us a quick example of how to calculate Z score with R:

In short, the z-score is a measure that shows how much away (below or above) of the mean is a specific value (individual) in a given dataset. In the example below, I am going to measure the z value of body mass index (BMI) in a dataset from NHANES.

Because R is a set-oriented, functional programming language, the answer is quite simple.

Comments closed

An Overview of Generative Adversarial Networks

Published 2020-02-17 by Kevin Feasel

Mohammad Waseem takes us through an overview of Generative Adversarial Networks:

Generative models are nothing but those models that use an Unsupervised Learning approach. In a generative model, there are samples in the data i.e input variables X, but it lacks the output variable Y. We use only the input variables to train the generative model and it recognizes patterns from the input variables to generate an output that is unknown and based on the training data only.
In Supervised Learning, we are more aligned towards creating predictive models from the input variables, this type of modeling is known as discriminative modeling. In a classification problem, the model has to discriminate as to which class the example belongs to. On the other hand, unsupervised models are used to create or generate new examples in the input distribution.
To define generative models in layman’s terms we can say, generative models, are able to generate new examples from the sample that are not only similar to other examples but are indistinguishable as well.

Click through for the overview.

Comments closed

Monitoring for Distribution Changes

Published 2020-02-13 by Kevin Feasel

Nina Zumel explains how we can track if something has changed by monitoring its distribution:

A client recently came to us with a question: what’s a good way to monitor data or model output for changes? That is, how can you tell if new data is distributed differently from previous data, or if the distribution of scores returned by a model have changed? This client, like many others who have faced the same problem, simply checked whether the mean and standard deviation of the data had changed more than some amount, where the threshold value they checked against was selected in a more or less ad-hoc manner. But they were curious whether there was some other, perhaps more principled way, to check for a change in distribution.

The answer is, of course, that there is. Click through to see a few of the techniques.

Comments closed

Benford’s Law in Power BI

Published 2020-02-11 by Kevin Feasel

Imke Feldmann shows how you can build up a Benford distribution in DAX:

The green columns show how often each number should be the first digit in numbers that should follow the Benford-distribution. In black you’ll see the actual distribution of first digits within my table. Lastly, the red line shows the percentual absolute deviations between actual and Benford values.
In this example, there is a relatively high occurrence of numbers starting with 4 and 5. So this could be a sign for fraudulent manipulations.

In the example, eyeballing it says things look pretty good. It’s interesting to see just how many things fit a Benford distribution, including populations, budgets (when you have enough line items), expenses, etc. Not everything does, however—high and low temperatures tend not to, either in Fahrenheit or Celsius.

Comments closed

Machine Learning through Counterfactuals

Published 2020-02-04 by Kevin Feasel

Amit Sharma announces a new library:

Consider a person who applies for a loan with a financial company, but their application is rejected by a machine learning algorithm used to determine who receives a loan from the company. How would you explain the decision made by the algorithm to this person? One option is to provide them with a list of features that contributed to the algorithm’s decision, such as income and credit score. Many of the current explanation methods provide this information by either analyzing the algorithm’s properties or approximating it with a simpler, interpretable model.
However, these explanations do not help this person decide what to do next to increase their chances of getting the loan in the future. In particular, changing the most important features for prediction may not actually change the decision, and in some cases, important features may be impossible to change, such as age. A similar argument applies when algorithms are used to support decision-makers in scenarios such as screening job applicants, deciding health insurance, or disbursing government aid.

This has the potential to be a great library. One of the issues with machine learning as it stands today is that you can get an answer, but to understand how to change the answer requires having a human understand the model. This looks like a good first step. It’s only available in Python.

Comments closed

Explaining Black Box Models with LIME

Published 2020-01-27 by Kevin Feasel

Holger von Jouanne-Diedrich takes us through the intuition of LIME:

There is a new hot area of research to make black-box models interpretable, called Explainable Artificial Intelligence (XAI), if you want to gain some intuition on one such approach (called LIME), read on!
Before we dive right into it it is important to point out when and why you would need interpretability of an AI. While it might be a desirable goal in itself it is not necessary in many fields, at least not for users of an AI, e.g. with text translation, character and speech recognition it is not that important why they do what they do but simply that they work.
In other areas, like medical applications (determining whether tissue is malignant), financial applications (granting a loan to a customer) or applications in the criminal-justice system (gauging the risk of recidivism) it is of the utmost importance (and sometimes even required by law) to know why the machine arrived at its conclusions.
One approach to make AI models explainable is called LIME for Local Interpretable Model-Agnostic Explanations. There is already a lot in this name!

LIME is not trivial to use and it can be very slow, but it is a great way to visualize models.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Category: Data Science