Singular Vector Decomposition and PCA interpretation The covariance matrix of any data X -> N * D (

Blericker74

Blericker74

Answered question

2022-07-04

Singular Vector Decomposition and PCA interpretation
The covariance matrix of any data X -> N * D (N samples and D dimension ) would be C o v ( X ) = E [ ( X E [ X ] ) ( X E [ X ] ) T ]. Let's Assume (X - E[X])=Y. Thus C o v ( X ) = E [ Y Y T ]. Now from SVD we know that U and V are just eigenvectors of A T A and A A T respectively. Thus, we can just use Eigen decomposition of Y Y T as it is symmetric.
The confusion I face is how do you interpret this Cov(X) = Eigendecomposition ( Y Y T ) = P D P 1 . From my point of view it is just a factorization of linear transformation.
How can we conclude that which dimension would have highest covariance from this factorization? I am having hard time interpreting it as anything but transformation, meaning if I multiply a vector with P D P 1 , the vector will get transformed to a space where eigen vectors are basis. In this case they are orthonormal and hence its just rotation and then scaled and then put back to original space.
All I can view it as a factorization which tells us about the Cov(X) as a transformation rather than Cov(X) itself.

Answer & Explanation

Antonio Dickerson

Antonio Dickerson

Beginner2022-07-05Added 5 answers

According to Wikipedia Principal Component Analyis comes in two ways:
As a basis transformation (diagonalization) of the covariance matrix: C = P D P
As a singular value decomposition of the (transposed) data matrix: X = U Σ W (assuming we have already subtracted the mean from X)
Since C = X X = W Σ 2 W and since Σ is a diagonal ( d × d )-matrix it is clear that Σ 2 = D and W=P. In other words, both approaches lead to the same diagonalization of the covariance matrix.
From basic linear algebra it is clear that the diagonal matrix D contains the eigenvalues and the columns of P are the eigenvectors. It is also clear that any permutation of those columns will just permute the diagonal elements of D.
We cannot conclude 'which dimension would have the highest covariance' but we know always which eigenvector corresponds to the highest, second highest, and so on, eigenvalue. That's all that counts.

Do you have a similar question?

Recalculate according to your conditions!

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?