Conflicting Results from t-test and F-based stepwise regression in multiple regression.

oliadas73 2022-09-03 Answered
Conflicting Results from t-test and F-based stepwise regression in multiple regression.
I currently am tasked with building a multiple regression model with two predictor variables to consider. That means there are potentially three terms in the model, Predictor A (PA), Predictor B (PB) and PA*PB.
In one instance, I made a LS model containing all three terms, and did simple t-tests. I divided the parameter estimates by their standard errors to calculate t-statistics, and determined that only the intercept and PA*PB coefficients were significantly different from zero.
In another instance, I did stepwise regression by first creating a model with only PA, and then fit a model to PA and PB, and did an F-test based on the Sum of Squares between the two models. The F-test concluded that PB was a significant predictor to include in the model, and when I repeated the procedure, the PA*PB coefficient was found to reduce SSE significantly as well.
So in summary, the t-test approach tells me that only the cross-product term PA*PB has a significant regression coefficient when all terms are included in the model, but the stepwise approach tells me to include all terms in the model.
Based on these conflicting results, what course of action would you recommend?
You can still ask an expert for help

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

Solve your problem for the price of one coffee

  • Available 24/7
  • Math expert for every subject
  • Pay only if we can solve it
Ask Question

Answers (2)

Krha77
Answered 2022-09-04 Author has 8 answers
1.Removing variables just because they don't have marginal significance is bad. If you want to use a significance-based approach, the stepwise method is much better. The glaring problem with just whacking variables a bunch of variables at once cause they're individually insignificant is that they may well be jointly significant. The stepwise approach at least doesn't have this problem.
2.There's usually no good reason to use a significance based approach. If your goal is prediction, the best thing to do is to test each model for out of sample performance (according to some assorted metrics) and see which one does the best. There are also information-criteria (cp, aic, etc) that are supposed to evaluate out-of-sample performance based on in-sample performance and a model complexity penalty, but again, why use these if you have enough data to test out of sample performance directly? (As with most one-size-fits-all advice, this is a bit strong. These things and even stepwise regression have their place and can be good solutions sometimes. I'm just saying what I think is usually best in a generic situation, if there is such a thing.)
Did you like this example?
Subscribe for all access
KesseTher12
Answered 2022-09-05 Author has 6 answers
Backward elimination. Your first try with t tests is somewhat like backward elimination. In backward elimination, you begin with all three explanatory variables, and then start eliminating the weakest one. (You should have eliminated only the weakest one at the first step.) Then do the multiple regression with the two strongest explanatory variables, and see if either of them should be eliminated.
Forward selection. Your second try with an F test somewhat like forward selection. Do all three simple (one predictor) regressions. Select the strongest predictor variable. Then do two regressions in which each of the other variables is given a chance. If either makes a 'significant' improvement choose it. If so, do a third multiple regression with all three predictor variables, and see if adding the third variable helps.
Many software packages have the ability to do each of these stepwise procedures automatically with specified criteria for inclusion or exclusion at each step. However, in my experience, using something like eight or a dozen possible predictor variables, forward selection and backward elimination almost never give the same set of variables. At each step, there may be close calls, and predictor variables are typically somewhat correlated with one another.
Mandatory inclusion. A common approach is to designate a particular set of variables as mandatory to include because it seems clear in advance that they ought to have predictive potential. Then start forward selection with those mandatory variables. And never eliminate them in backward elimination. That can work well if the mandatory variables are selected wisely.
At the end, absent extraneous considerations, if you get two or three different sets of predictor variables from different stepwise procedures, you can check each of them to see which is best. (An 'extraneous consideration' would be if your boss has a strong preference he/she can't explain.)
Did you like this example?
Subscribe for all access

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

You might be interested in

asked 2022-10-05
How to write an equation where both independent variables and dependent variables are log transformed in a multiple regression?
How to write the multiple regression model when both the dependent variable and independent variables are log-transformed?
I know that without any log transformation the linear regression model would be written as enter image description here
y = β 0 + β 1 ( x 1 ) + β 2 ( x 2 ) +
But now I have transformed both my dependent variables and independent variable with log. So is correct to write as enter image description here log ( y ) = β 0 + β 1 log ( x 1 ) + β 2 log ( x 2 ) +
Or since I am transforming both sides of question so can I write it as enter image description here
ln ( y ) = β 0 + β 1 ( x 1 ) + β 2 ( x 2 ) +
asked 2022-08-28
What is J in while calculating SST in multiple regression?
I am little confused what actually is the J in the formula of the SST and SSR for multiple regression
SST= Y T [ 1 1 n J ] Y
SSR= Y T [ H 1 n J ] Y
asked 2022-08-11
Find the constrained least-squares estimator for a multiple regression model
Consider the multiple regression model
Y = X β + ϵ
with the restriction that l n b i = 1
I want to find the least squares estimator of β, so I need to solve the following optimization problem
m i n ( Y X β ) t ( Y X β )
s . t . l n b i = 1
Let's set
L = ( Y X β ) t ( Y X β ) λ ( U t β 1 ) = Y t Y + β t X t X β + 2 β t X t Y λ ( U t β 1 )
where U is a dummy vector of ones (and therefore U T β = l n b i ).
Take derivatives
d d β = 2 X t X β 2 X t Y λ U t = 0
d d λ = U t β 1 = 0
So from the first equation we can get an expression for β, but what should I do with the λ? The second equation doesn't seem to be useful to get rid of it.
asked 2022-07-14
Why is it valid to use squared Euclidean distances in high dimensions in multiple regression?
Euclidean distance is not linear in high dimensions. However, in multiple regression the idea is to minimize square distances from data points to a hyperplane.
Other data analysis techniques have been considered problematic for their reliance on Euclidean distances (nearest neighbors), and dimensionality reduction techniques have been proposed.
Why is this not a problem in multiple regression?
asked 2022-08-11
Matrix derivative in multiple linear regression model
The basic setup in multiple linear regression model is
Y = [ y 1 y 2 y n ]
X = [ 1 x 11 x 1 k 1 x 21 x 2 k 1 x n 1 x n k ]
β = [ β 0 β 1 β k ]
ϵ = [ ϵ 1 ϵ 2 ϵ n ]
The regression model is Y = X β + ϵ
To find least square estimator of β vector, we need to minimize S ( β ) = Σ i = 1 n ϵ i 2 = ϵ ϵ = ( y x β ) ( y x β ) = y y 2 β x y + β x x β
S ( β ) β = 0
My question: how to get 2 x y + 2 x x β?
asked 2022-10-14
Multiple linear regression linear relationship or not
How should a multiple linear regression be interpreted that has statistically significant predictors, but an R Square value of 0.004? Does that mean that there is a significant linear relationship (because statistically significant predictors), even though there is close to no linear relationship ( R 2 of 0.004 indicates close to no linear relationship).
asked 2022-08-13
Multiple linear regression b 0 = 0
I am trying to calculate the coefficients b 1 , b 2 , . . . of a multiple linear regression, with the condition that b 0 = 0. In Excel this can be done using the RGP Function and setting the constant to FALSE.
How can this be done with a simple Formular?
Thank you in Advance!