Reducing Dimensionality

Antoine Guillot explains some of the basic concepts of variable reduction in a data analysis:

Each of these people can be represented as points in a 3 Dimensional space. With a gross approximation, each people is in a 50*50*200 (cm) cube. If we use a resolution of 1cm and three color channels, then can be represented by 1,000,000 variables.
On the other hand, the shadow is only in 2 dimensions and in black and white, so each shadow only needs 50*200=10,000 variables.
The number of variables was divided by 100 ! And if your goal is to detect human vs cat, or even men vs women, the data from the shadow may be enough.

Read on for intuitive discussions of techniques like principal component analysis and linear discriminant analysis.  H/T R-Bloggers

Related Posts

Tidy Anomaly Detection With Anomalize

Abdul Majed Raja walks us through an example using the anomalize package: One of the important things to do with Time Series data before starting with Time Series forecasting or Modelling is Time Series Decomposition where the Time series data is decomposed into Seasonal, Trend and remainder components. anomalize has got a function time_decompose() to perform the same. […]

Read More

Uploading Data Sets To Azure ML From R

Leila Etaati continues her series on the Azure ML R package by showing how to upload a data set: There is a function in AzureML package name “workspace” that creates a reference to an AzureML Studio workspace by getting the authentication token and workspace id as below: 1 ws <– workspace( id , auth  ) to […]

Read More


August 2017
« Jul Sep »