Overfitting With Polynomial Regression

2018-05-10

Even if the function to be estimated is very smooth, due to machine precision, only the first three or four coefficients can be accurately computed. With infinite precision, all coefficients would be correctly computed without over-fitting. We first explore this problem from a mathematical point of view in the next section, then provide recommendations for practical model implementations in the last section.

This is also a good read for professionals with a math background interested in learning more about data science, as we start with some simple math, then discuss how it relates to data science. Also, this is an original article, not something you will learn in college classes or data camps, and it even features the solution to a linear regression involving an infinite number of variables.

Granville’s point that overfitting is a relatively small concern is rather interesting.  But the advice to avoid polynomial regression is generally pretty solid.

Using xplain To Interpret Model Results

2018-05-21

Joachim Zuckarelli walks us through the xplain package in R: The above XML produces the following output (don’t worry too much about the call of xplain(), we will discuss later on in more detail how to work with the xplain() function): library(car) library(xplain) xplain(call="lm(education ~ young + income + urban, data=Anscombe)", xml="http://www.zuckarelli.de/xplain/example_lm_foreach.xml") ## ## Call: ## lm(formula = education […]

Sentiment Analysis Of Hotel California

2018-05-21

Sara Locatelli analyzes the lyrics to Hotel California using tidytext: Sentiment analysis is a method of natural language processing that involves classifying words in a document based on whether a word is positive or negative, or whether it is related to a set of basic human emotions; the exact results differ based on the sentiment […]