PCA formulation: Find the best rank-k subspace of d-dimensional space Finding the best rank-k subsp

manierato5h

manierato5h

Answered question

2022-06-21

PCA formulation: Find the best rank-k subspace of d-dimensional space
Finding the best rank-k subspace of d-dimensional space can be interpreted as the Principle Component Analysis (PCA) problem provided that the goal of optimization is minimizing the error in the sense of variance.
Assume we have access to the data sample x 1 , , x n R n , prove that the first problem is equivalent to the second one:
(1) U n ^ = arg min l = 1 n x l P U ( x l ) 2 2
such that
U R d × k , U T U = I k
(2) U n ^ = arg min X n U U T X n F 2
such that
U R d × k , U T U = I k
where Ik is Identity matrix, P U = U U T is a projection matrix, X n = [ x 1 x n ] is a matrix of data, F is Frobenius norm, i.e. A F = Trace ( A T A )

Answer & Explanation

Sage Mcdowell

Sage Mcdowell

Beginner2022-06-22Added 19 answers

I start with x l P U ( x l ) 2 2 . It can be written as x l P U ( x l ) 2 2 = ( x l T x l T U U T ) T ( x l U U T x l ) =
= x l T x l x l T U U T x l x l T U U T x l + x l T U U T U U T x l
since U T U = I k
= x l T x l x l T U U T x l x l T U U T x l + x l T U U T x l = x l T x l x l T U U T x l
l = 1 n x l P U ( x l ) 2 2 = l = 1 n ( x l T x l x l T U U T x l )
since the value of the sum is scalar we can write
(1) l = 1 n ( x l T x l x l T U U T x l ) = Trace ( l = 1 n ( x l T x l x l T U U T x l ) ) = Trace ( [ x 1 T 0 0 0 x 2 T 0 0 0 0 0 x n T ] [ x 1 0 0 0 x 2 0 0 0 0 0 x n ] [ x 1 T 0 0 0 x 2 T 0 0 0 0 0 x n T ] U U T [ x 1 0 0 0 x 2 0 0 0 0 0 x n ] )
On the other hand,
[ x 1 T x 2 T x n T ] [ x 1 x 2 x n ] = [ x 1 T x 1 x 1 T x 2 x 1 T x n x 1 T x 2 x 1 T x 2 x 2 T x n x n T x 1 x n T x 2 x n T x n ]
Note that
Trace ( [ x 1 T x 1 x 1 T x 2 x 1 T x n x 1 T x 2 x 1 T x 2 x 2 T x n x n T x 1 x n T x 2 x n T x n ] ) = Trace ( [ x 1 T 0 0 0 x 2 T 0 0 0 0 0 x n T ] [ x 1 0 0 0 x 2 0 0 0 0 0 x n ] )
And,
[ x 1 T x 2 T x n T ] U U T [ x 1 x 2 x n ] = [ x 1 T U U T x 2 T U U T x n T U U T ] [ x 1 x 2 x n ]
= [ x 1 T U U T x 1 x 1 T U U T x 2 x 1 T U U T x n x 1 T U U T x 2 x 1 T U U T x 2 x 2 T U U T x n x n T U U T x 1 x n T U U T x 2 x n T U U T x n ]
Note that
Trace ( [ x 1 T U U T x 1 x 1 T U U T x 2 x 1 T U U T x n x 1 T U U T x 2 x 1 T U U T x 2 x 2 T U U T x n x n T U U T x 1 x n T U U T x 2 x n T U U T x n ] ) = Trace ( [ x 1 T 0 0 0 x 2 T 0 0 0 0 0 x n T ] U U T [ x 1 0 0 0 x 2 0 0 0 0 0 x n ] )
Hence,
( 1 ) = Trace ( X T X X T U U T X )
= Trace ( X T X X T U U T X X T U U T X + X T U U T U U T X ) = Trace ( ( X U U T X ) T ( X U U T X ) ) = X U U T X F 2

Do you have a similar question?

Recalculate according to your conditions!

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?