Statistical analyses Imagine that you have run PCA on data gathered from one of the questionnaires

Mohamed Mooney

Mohamed Mooney

Answered question

2022-06-23

Statistical analyses
Imagine that you have run PCA on data gathered from one of the questionnaires gathered by the car manufacturer in which 10’000 people gave their age, gender, country of residence and the car model they purchased.
i) You find that all N eigenvectors (N=dimensionality of the dataset) covers the same % of the data variance. What is N here? What is this % of variance?
ii) How would you interpret the results in i? What if 1 eigenvector cover 99% of the data variance.
Thanks in advance.

Answer & Explanation

nuvolor8

nuvolor8

Beginner2022-06-24Added 32 answers

Assume you have a (TxN) matrix of data, X, who's columns are the variables you've listed and whose rows are the different observations. Then X T X is a square (NxN) matrix representing the empirical covariance matrix of X.
Here, N refers to the number of the variables in your dataset. One of the ways PCA can be useful is that it allows you to take your N possibly correlated variables (which will be summarized in X T X) and transform them into N linearly uncorrelated variables (called principal components). Linearly uncorrelated implies the off-diagonal elements of its covariance matrix will be zero
The PCA is most helpful in cases like in ii). If we find that 99% of the variance of X is explained by one principal component (aka eigenvector), then we can focus our attention on modeling or analyzing that one principal component. In this case, we've reduced our analysis from an N-dimensional problem to a 1-dimensional one. If this isn't the case, like in i), then the PCA hasn't really done much for us. We still have an N-dimensional problem like before.

Do you have a similar question?

Recalculate according to your conditions!

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?