"A common theme in linear compression and feature extraction is to map a high dimensional vector

Flakqfqbq

Flakqfqbq

Answered question

2022-05-07

"A common theme in linear compression and feature extraction is to map a high dimensional vector x to a lower dimensional vector y = W x such that the information in the vector x is maximally preserved in y. Opten PCA is applied for this purpose. However, the optimal setting for W is in generall not given by the widely used PCA. Actually, PCA is sub-optimal special case of mutual information maximisation."

Can anyone elaborate why PCA is a sub-optimal special case of mutual information maximisation ?

Answer & Explanation

nelppeazy9v3ie

nelppeazy9v3ie

Beginner2022-05-08Added 22 answers

An easy to explain example is if you have two sets of functions very corrupted by high energy noise. You want to find what parts / subsets / linear combinations of these correspond to each other the most.

If we just go for PCA it will optimize subspaces looking for dimensions of highest L2 norm in different senses, but if our noise has higher L2-norm than functions of interest it will rather select noise than functions of interest! And we know that independently sampled uncorrelated noise will have very low mutual information with just about anything deterministic of interest.

Therefore we will do better if we search for a method which does not focus so much on norm of actual signal/function but on some statistical correspondence like... for example, cross correlation or covariance.

Do you have a similar question?

Recalculate according to your conditions!

New Questions in High school geometry

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?