How to interpret parameter estimates in factor prediction ( in R ) So I have some data set in a .cs

hanglutuupx6

hanglutuupx6

Answered question

2022-06-03

How to interpret parameter estimates in factor prediction ( in R )
So I have some data set in a .csv file and there are three factor levels, 1 , 2 , 3, (there are fifteen of each) and each has a corresponding score.
Here are some details.
so the data is contained in a simple csv file, the first column is labelled Team, and the second column is labelled Score.
The first column consists of fifteen 1's, followed by fifteen 2's , followed by fifteen 3's.
The R code I used was
data.source<-"http.www.. " ( the data set)
SportScores<-read.csv(file=data.source)
I set x such that x prints 1 1 1.... 1 2 2 2 ... 2 3 3 3 .... 3 Levels 1 2 3
names(sportScores)
y<-SportScores$Scores
So using lm I get parameter estimates in R as
Intercept (35.800)
x2 (0.066)
x3 (12.40)
the t value are very large for intercept and x3, but very small for x2, ie it indicated to me that we cannot reject the null in this case, but what is the null?
β 0 = 35.8
β 1 c = 0.06667
β 2 c = 12.40
But how do I interpret this? I want to see any differences in scores between the 3 levels, etc. I mean, what even is the test being conducted? For example
β 1 c = 0.06667
has a small t value, so the null hypothesis is not rejected, but what even is the null hypothesis in this case? Moreover, from the code output itself, how can I know the associated individual standard errors of the estimated means?

Answer & Explanation

prhljaju396r1

prhljaju396r1

Beginner2022-06-04Added 3 answers

In R, the lm() function is very useful to perform a regression analysis on categorical variables. To understand how R manages the data with this command, we can remind that, if we have m factors, each with n observations, R starts from the basic equation of the classical random effects model
Y i j = μ + U i + ϵ i j
where Y i j is the value of the j t h item of the i t h factor, μ is the average score for the whole population, U i is the factor-specific random effect, and ϵ i j is the individual-specific effect. To make an example, let us suppose that m soccer teams are randomly chosen among all teams of the world, and that n players are randomly chosen from each selected team. The performance scores for each player in a given year are collected. Applying the random effect model, Y i j is the score of the j t h player of the i t h team, μ is the average random score for the entire population, U i is the random team-specific effect, and ϵ i j is the player-specific effect. In this model, the term U i quantifies the difference between the average score of the team i and the overall average score observed in the entire population. It is defined as a "random" effect because each team have been randomly selected from a larger number of teams. In your case, these considerations can be directly applied to your three factors/levels, each with 15 observations.
However, you have to consider that the lm() function of R typically uses a reparameterization , commonly called the "reference cell model". Here, one of the U i (usually the first) is set to zero and is used as a reference. In this approach, which we could be write as
Y i j = μ + U i + ϵ i j
the mean of category 1 is taken as the intercept μ , and the term Ui measures the difference between the average score of the team i and the average score observed in the reference category 1. So, looking at the R output in your question, the intercept corresponds to μ (the mean of the first level), the coefficient x 2 is an estimate of the difference in means between level 2 and level 1, and similarly the coefficient x 3 is an estimate of the difference in means between level 3 and level 1. Note that the output does not include any x 1 coefficient, just because the first level is the reference. Also note that in your output x 2 is rather small and x 3 is large; accordingly, the t value is large for x 3 and small for x 2 . This means that the difference between level 3 and the reference level 1 is probably highly significant, whereas that between level 2 and the reference level 1 is probably not significant (however, to correctly assess significance, you have to look at the p values, which are given in the R output). The high t value of the intercept simply expresses the significance for testing the difference between the mean of the first category and 0, and therefore is not particularly useful.

Do you have a similar question?

Recalculate according to your conditions!

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?