Start with PCA and multiple regression or Start with multiple regression and PCA

anudoneddbv 2022-07-16 Answered
Start with PCA and multiple regression or Start with multiple regression and PCA
I would like to know something easy but very important.
Imagine I have a database with 0 NA, a perfect database who has been clean. And I have to do a PCA on this database. This datebase got a lot of individuals and variables ( 95 individuals and 10 variables)
I have to do a multiple regression and a PCA.
I must start per my multiple regression and eventually delete somme individuals who has been a Cook's distance > at the limit. And after I do my PCA on " new data base"
OR I must start per my PCA on my complete database, and after I do my multiple regression.
In conclusion, I must do :
- PCA
- multiple Regression
or
-multiple Regression
-PCA
Ty for helping me !
You can still ask an expert for help

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

Solve your problem for the price of one coffee

  • Available 24/7
  • Math expert for every subject
  • Pay only if we can solve it
Ask Question

Answers (1)

Reinfarktq6
Answered 2022-07-17 Author has 18 answers
Regression should be the final step, not the first one. By using PCA you can reduce dimension (i.e., number of explanatory variables) by discarding "unimportant" (that is, with small variance) variables. You can use PCA to perform whitening, i.e., eliminating autocorrelation or heteroscedsticity (inhomogeneous variance) in your data (or future model's residuals). Note that if you are interested in point prediction or R square, regressing on the original features yield the same results as on the principal components. Namely, you should use the PCA for further reduction and tiding of your data (if possible), and not just for the sake of doing PCA itself.
Not exactly what you’re looking for?
Ask My Question

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

You might be interested in

asked 2022-07-20
Correlation of Rolling Two Dice
If A is a random variable responsible for calculating the sum of two independent rolls of a die, and B is the result of calculating the value of first roll minus the value second roll, is is true that A and B have a c o v ( A , B ) 0? In other words, is it true that they are correlated?
I've come to the conclusion that they must be correlated because they are not independent, that is, the event of A can have an impact on event B, but I remain stuck due to the fact that causation does not necessarily imply correlation.
I know that independence −> uncorrelation, but that the opposite isn't true.
asked 2022-05-17

The incomplete dot plot shows the result of a survey in which each student was asked how many dimes were in their pockets or wallets. The results for “4 dimes” are not shown. Each dot represents one student. It is known that 12.5% of the students had one dime.

a)

Find the number of students surveyed. Then complete the dot plot.

b)

What percent of the students had either 0 or 6 dimes?

c)

What percent of the students had either 1 or 5 dimes?

d)

Briefly describe the distribution of the data

asked 2022-06-30
If the coefficients of a simple regression line, B 0 and B 1 , are the same then why are the regression lines of y on x and x on y different given the condition r 2 < 1. I have tried all the manipulation and graphical analysis I can but can't seem to see why this is happening.
asked 2022-05-28
A similarity/metric learning method that takes in the form of x T W y = z, where x and y are real valued vectors. For example, two images.
Breaking it into a more familiar form:
x T W y = i j w i j x i y j = z
This essentially looks very similar to polynomial regression with only interactions between features (without the polynomials). i.e.
z = f w ( x ) = i w i x i + i j = i + 1 w i j x i x j
I was curious to see if the optimization for the matrix W is the same as doing optimization for multivariate linear/polynomial regression, since x and y are fixed, and the only variate is the weight matrix W?
asked 2022-07-18
Multiple regression problems (restricted regression, dummy variables)
Q1.
Model 1: Y = X 1 β 1 + ε
Model 2: Y = X 1 β 1 + X 2 β 2 + ε
(a) Suppose that Model 1 is true. If we estimates OLS estrimator b 1 for β 1 in Model 2, what will happen to the size and power properties of the test?
(b) Suppose that Model 2 is true. If we estimates OLS estrimator b 1 for β 1 in Model 1, what will happen to the size and power properties of the test?
-> Here is my guess.
(a) b 1 is unbiased, inefficient estimator. (I calculated it using formula for "inclusion of irrelevant variable" and b 1 = ( X 1 M 2 X 1 ) 1 X 1 M 2 Y where M 2 is symmetric and idempotent matrix) Inefficient means that it has larger variance thus size increases and power increases too.
(b) b 1 is biased, efficient estimator. (I use formular for "exclusion of relevant variable" and b 1 = ( X 1 X 1 ) 1 X 1 Y) Um... I stuck here. What should I say using that information?
Q2.
Let Q and P be the quantity and price. Relation between them is different across reions of east, west, south and north, and as well, for different 4 seasons. Construct a model.
-> Actually, I don't know well about dummy variables. So any please solve this problem to help me.
asked 2022-07-18
Correlation: Concept to FormulaIn digital signal processing, we calculate the correlation between two discrete signals by multiplying corresponding samples of the two signals and then adding the products. Where does this process/formula for correlation come from?
I understand the concept of correlation (similarity) between two signals. But I fail to understand how it translates to the formula that it does.
All the texts I have seen so far start with this formula and explain cross correlation, auto correlation, etc. None of them attempt to explain how the formula was derived in the first place.
asked 2022-06-15
regression x ( t ) = a t + b, number of trials, and R 2 of the regression. How do I find the value and 95 % confidence interval for the value of V = x / t?

New questions

The Porsche Club of America sponsors driver education events that provide high-performance driving instruction on actual racetracks. Because safety is a primary consideration at such events, many owners elect to install roll bars in their cars. Deegan Industries manufactures two types of roll bars for Porsches. Model DRB is bolted to the car using existing holes in the car's frame. Model DRW is a heavier roll bar that must be welded to the car's frame. Model DRB requires 20 pounds of a special high alloy steel, 40 minutes of manufacturing time, and 60 minutes of assembly time. Model DRW requires 25 pounds of the special high alloy steel, 100 minutes of manufacturing time, and 40 minutes of assembly time. Deegan's steel supplier indicated that at most 40,000 pounds of the high-alloy steel will be available next quarter. In addition, Deegan estimates that 2000 hours of manufacturing time and 1600 hours of assembly time will be available next quarter. The pro?t contributions are $200 per unit for model DRB and $280 per unit for model DRW. The linear programming model for this problem is as follows:
Max 200DRB + 280DRW
s.t.
20DRB + 25DRW 40,000 Steel Available
40DRB + 100DRW ? 120,000 Manufacturing minutes
60DRB + 40DRW ? 96,000 Assembly minutes
DRB, DRW ? 0
Optimal Objective Value = 424000.00000
Variable Value blackuced Cost
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DRB 1000.00000 0.00000
DRW 800.00000 0.00000
Constraint Slack/ Surplus Dual Value
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1 0.00000 8.80000
2 0.00000 0.60000
3 4000.00000 0.00000
Objective Allowable Allowable
Variable Coef?cient Increase Decrease
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DRB 200.00000 24.00000 88.00000
DRW 280.00000 220.00000 30.00000
RHS Allowable Allowable
Constraint Value Increase Decrease
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1 40000.00000 909.09091 10000.00000
2 120000.00000 40000.00000 5714.28571
3 96000.00000 Infnite 4000.00000
a. What are the optimal solution and the total profit contribution?
b. Another supplier offeblack to provide Deegan Industries with an additional 500 pounds of the steel alloy at $2 per pound. Should Deegan purchase the additional pounds of the steel alloy? Explain.
c. Deegan is considering using overtime to increase the available assembly time. What would you advise Deegan to do regarding this option? Explain.
d. Because of increased competition, Deegan is considering blackucing the price of model DRB such that the new contribution to profit is $175 per unit. How would this change in price affect the optimal solution? Explain.
e. If the available manufacturing time is increased by 500 hours, will the dual value for the manufacturing time constraint change? Explain.