# What Happens In Deep Neural Networks?

2017-11-17

Adrian Colyer has a two-parter summarizing an interesting academic paper regarding deep neural networks.  Part one introduces the theory:

Section 2.4 contains a discussion on the crucial role of noise in making the analysis useful (which sounds kind of odd on first reading!). I don’t fully understand this part, but here’s the gist:

The learning complexity is related to the number of relevant bits required from the input patterns $X$ for a good enough prediction of the output label $Y$, or the minimal $I(X; \hat{X})$ under a constraint on $I(\hat{X}; Y)$ given by the IB.

Without some noise (introduced for example by the use of sigmoid activation functions) the mutual information is simply the entropy $H(Y)$independent of the actual function we’re trying to learn, and nothing in the structure of the points $p(y|x)$ gives us any hint as to the learning complexity of the rule. With some noise, the function turns into a stochastic rule, and we can escape this problem. Anyone with a lay-person’s explanation of why this works, please do post in the comments!

The different colours in the chart represent the different hidden layers (and there are multiple points of each colour because we’re looking at 50 different runs all plotted together). On the x-axis is $I(X;T)$, so as we move to the right on the x-axis, the amount of mutual information between the hidden layer and the input $X$ increases. On the y-axis is $I(T;Y)$, so as we move up on the y-axis, the amount of mutual information between the hidden layer and the output $Y$ increases.

I’m used to thinking of progressing through the network layers from left to right, so it took a few moments for it to sink in that it’s the lowest layer which appears in the top-right of this plot (maintains the most mutual information), and the top-most layer which appears in the bottom-left (has retained almost no mutual information before any training). So the information path being followed goes from the top-right corner to the bottom-left traveling down the slope.

This is worth a careful reading.

## Where Machine Learning And Econometrics Collide

Dave Giles shares some thoughts on how machine learning and econometrics relate: What is Machine Learning (ML), and how does it differ from Statistics (and hence, implicitly, from Econometrics)? Those are big questions, but I think that they’re ones that econometricians should be thinking about. And if I were starting out in Econometrics today, I’d […]

## Auto ML With SQL Server 2019 Big Data Clusters

Marco Inchiosa has a model scenario for using Big Data Clusters to scale out a machine learning problem: H2O provides popular open source software for data science and machine learning on big data, including Apache SparkTM integration. It provides two open source python AutoML classes: h2o.automl.H2OAutoML and pysparkling.ml.H2OAutoML. Both APIs use the same underlying algorithm implementations, […]