I have 2 groups of people. I'm working with the data about their age. I know the means, the standard

Riya Hansen

Riya Hansen

Answered question

2022-07-01

I have 2 groups of people. I'm working with the data about their age. I know the means, the standard deviations and the number of people. I don't know the data of each person in the groups.
Group 1 :
Mean = 35 years old; SD = 14; n = 137 people
Group 2 :
Mean = 31 years old; SD = 11; n = 112 people
I want to combine those 2 groups to obtain a new mean and SD. It's easy for the mean, but is it possible for the SD? I do not know the distribution of those samples, and I can't assume those are normal distributions. Is there a formula for distributions that aren't necessarily normal?

Answer & Explanation

Wade Atkinson

Wade Atkinson

Beginner2022-07-02Added 12 answers

Continuing on from BruceET's explanation, note that if we are computing the unbiased estimator of the standard deviation of each sample, namely
s = 1 n 1 i = 1 n ( x i x ¯ ) 2 ,
and this is what is provided, then note that for samples x = ( x 1 , , x n ), y = ( y 1 , , y m ), let z = ( x 1 , , x n , y 1 , , y m ) be the combined sample, hence the combined sample mean is
z ¯ = 1 n + m ( i = 1 n x i + j = 1 m y i ) = n x ¯ + m y ¯ n + m .
Consequently, the combined sample variance is
s z 2 = 1 n + m 1 ( i = 1 n ( x i z ¯ ) 2 + j = 1 m ( y i z ¯ ) 2 ) ,
where it is important to note that the combined mean is used. In order to have any hope of expressing this in terms of s x 2 and s y 2 , we clearly need to decompose the sums of squares; for instance,
( x i z ¯ ) 2 = ( x i x ¯ + x ¯ z ¯ ) 2 = ( x i x ¯ ) 2 + 2 ( x i x ¯ ) ( x ¯ z ¯ ) + ( x ¯ z ¯ ) 2 ,
thus
i = 1 n ( x i z ¯ ) 2 = ( n 1 ) s x 2 + 2 ( x ¯ z ¯ ) i = 1 n ( x i x ¯ ) + n ( x ¯ z ¯ ) 2 .
But the middle term vanishes, so this gives
s z 2 = ( n 1 ) s x 2 + n ( x ¯ z ¯ ) 2 + ( m 1 ) s y 2 + m ( y ¯ z ¯ ) 2 n + m 1 .
Upon simplification, we find
n ( x ¯ z ¯ ) 2 + m ( y ¯ z ¯ ) 2 = m n ( x ¯ y ¯ ) 2 m + n ,
so the formula becomes
s z 2 = ( n 1 ) s x 2 + ( m 1 ) s y 2 n + m 1 + n m ( x ¯ y ¯ ) 2 ( n + m ) ( n + m 1 ) .
This second term is the required correction factor.

Do you have a similar question?

Recalculate according to your conditions!

New Questions in College Statistics

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?