Press "Enter" to skip to content

Category: Data Science

Linear Regression With Python In Power BI

Emanuele Meazzo builds a linear regression in Power BI using a Python visual:

As a prerequisite, of course, you’ll need to have python installed in your machine, I recommend having an external IDE like Visual Studio Code to write your Python code as the PowerBI window offers zero assistance to coding.

You can follow this article in order to configure Python Correctly for PowerBI.

Step 2 is to add a Python Visual to the page, and let the magic happen.

Click through for the step-by-step instructions, including quite a bit of Python code and a few warnings and limitations.

Comments closed

Matrix Operations in R

Kristian Larsen shows off some linear algebra concepts in R:

In this article, you learn how to do linear algebra in R. In particular, I will discuss how to create a matrix in R, Element-wise operations in R, Basic Matrix Operations in R, How to Combine Matrices in R, Creating Means and Sums in R and Advanced Matrix Operations in R.

The post is so chock-full of examples, the only block of multi-line text is the description.

Comments closed

Removing Serial Correlation

Vincent Granville has an easy trick for removing serial correlation from a data set:

Here is a simple trick that can solve a lot of problems.

You can not trust a linear or logistic regression performed on data if the error term (residuals) are auto-correlated. There are different approaches to de-correlate the observations, but they usually involve introducing a new matrix to take care of the resulting bias. See for instance here.  

Click through for the alternative.

Comments closed

Cross-Validation in a Picture

Stephanie Glen shows us cross-validation in one picture:

Cross Validation explained in one simple picture. The method shown here is k-fold cross validation, where data is split into k folds (in this example, 5 folds). Blue balls represent training data; 1/k (i.e. 1/5) balls are held back for model testing.

Monte Carlo cross validation works the same way, except that the balls would be chosen with replacement. In other words, it would be possible for a ball to appear in more than one sample.

You’ll have to click through for the picture.

Comments closed

Linear Programming in Python

Francisco Alvarez shows us an example of linear programming in Python:

The first two constraints, x1 ≥ 0 and x2 ≥ 0 are called nonnegativity constraints. The other constraints are then called the main constraints. The function to be maximized (or minimized) is called the objective function. Here, the objective function is x1 + x2.

Two classes of problems, called here the standard maximum problem and the standard minimum problem, play a special role. In these problems, all variables are constrained to be nonnegative, and all main constraints are inequalities.

That post spurred me on to look up LINGO and see that it’s actually still around.

Comments closed

Exploratory Data Analysis with inspectdf

Laura Ellis continues a dive into Exploratory Data Analysis, this time using the inspectdf package:

I like this package because it’s got a lot of functionality and it’s incredibly straightforward to use. In short, it allows you to understand and visualize column types, sizes, values, value imbalance & distributions as well as correlations. Better yet, you can run each of these features for an individual data frame, or compare the differences between two data frames.

I liked the inspectdf package so much that in this blog, I’m going to extend my previous EDA tutorial with an overview of the package.

There are some interesting functions which make EDA easier, so check it out.

Comments closed

Non-Linear Classifiers with Support Vector Machines

Rahul Khanna continues a series on support vector machines:

In this blog post, we will look at a detailed explanation of how to use SVM for complex decision boundaries and build Non-Linear Classifiers using SVM. The primary method for doing this is by using Kernels.

In linear SVM we find margin maximizing hyperplane with features Xi’s . Similarly, in Logistic regression, we also try to find the hyperplane which minimizes logistic loss with features Xi’s. Most often when we use both these techniques the results are the same. But linear SVM or for the same reason a logistic regression would fail where there is a need to have complex or non-linear decision boundaries. These types of boundaries are then achieved by SVM using Kernels. So let us understand how SVM creates non-linear boundaries using Kernels

Read on to see how it works.

Comments closed

Vectors for Programmers

John Mount has a couple of videos available:

We have just released two new free video lectures on vectors from a programmer’s point of view. I am experimenting with what ideas do programmers find interesting about vectors, what concepts do they consider safe starting points, and how to condense and present the material.

Click through for the links, one with Python examples and the other with R examples.

Comments closed

Defining TF-IDF

Bruno Stecanella explains the concept behind TF-IDF:

TF-IDF was invented for document search and information retrieval. It works by increasing proportionally to the number of times a word appears in a document, but is offset by the number of documents that contain the word. So, words that are common in every document, such as this, what, and if, rank low even though they may appear many times, since they don’t mean much to that document in particular.

However, if the word Bug appears many times in a document, while not appearing many times in others, it probably means that it’s very relevant. For example, if what we’re doing is trying to find out which topics some NPS responses belong to, the word Bug would probably end up being tied to the topic Reliability, since most responses containing that word would be about that topic.

This makes the technique useful for natural language processing, especially in classification problems.

Comments closed

Sentiment Analysis with Python

Bruno Stecanella shows us how to use MonkeyLearn to perform sentiment analysis in Python:

Sentiment analysis is a set of Natural Language Processing (NLP) techniques that takes a text (in more academic circles, a document) written in natural language and extracts the opinions present in the text.

In a more practical sense, our objective here is to take a text and produce a label (or labels) that summarizes the sentiment of this text, e.g. positiveneutral, and negative.

For example, if we were dealing with hotel reviews, we would want the sentence ‘The staff were lovely‘ to be labeled as Positive, and the sentence ‘The shared bathroom was absolutely disgusting‘ labeled as Negative.

Click through for a demo.

Comments closed