The confidence interval can be expressed in terms of samples (or repeated samples): "Were this procedure to be repeated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true population parameter 90% of the time."[1] Note that this does not refer to repeated measurement of the same sample, but repeated sampling.

Aleah Avery 2022-11-18 Answered
According to frequentists, why can't probabilistic statements be made about population paramemters?
The confidence interval can be expressed in terms of samples (or repeated samples): "Were this procedure to be repeated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true population parameter 90% of the time."[1] Note that this does not refer to repeated measurement of the same sample, but repeated sampling.
And:
The confidence interval can be expressed in terms of a single sample: "There is a 90% probability that the calculated confidence interval from some future experiment encompasses the true value of the population parameter." Note this is a probability statement about the confidence interval, not the population parameter.
And:
A 95% confidence interval does not mean that for a given realised interval calculated from sample data there is a 95% probability the population parameter lies within the interval, nor that there is a 95% probability that the interval covers the population parameter.[11] Once an experiment is done and an interval calculated, this interval either covers the parameter value or it does not; it is no longer a matter of probability.
You can still ask an expert for help

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

Solve your problem for the price of one coffee

  • Available 24/7
  • Math expert for every subject
  • Pay only if we can solve it
Ask Question

Answers (1)

Eva Cochran
Answered 2022-11-19 Author has 14 answers
Step 1
Suppose that you want to model the random behaviour of a certain population. Then you have to associate to the population a density function f (that is, you choose a "normal distribution", "exponential distribution", etc.), and a parametre θ (that is, if for example your density is a normal, then θ can be the population mean or the variance, etc.).
Suppose that you have decided which f you want, that is, the distribution for your population. The goal now is to estimate θ. In frequentist statistics, θ is an unknown contant to be discovered. That is why we speak about confidence and not about probability.
Example: imagine I want to model the height of the people in England. I associate to it the normal distribution, so f is the density function of a normal. Now I want to estimate μ = population mean. One takes a sample X 1 , , X n of heights and uses the fact that
X n ¯ μ s n / n t n 1 .
One computes a and b so that
P ( a < X n ¯ μ s n / n < b ) = 0.95 ,
that is,
P ( X n ¯ a s n / n < μ < X n ¯ b s n / n ) = 0.95.
Step 2
Here it makes sense to speak about probability because X n ¯ is a random variable. Now, what you do is to substitute X n ¯ (random variable) by the sample mean x n ¯ (constant value), and your confidence interval is
I = [ x n ¯ a s n / n , x n ¯ + b s n / n ] .
The parametre μ is a constant, so either it belongs to I or not (you do not have probability here). But you have a lot of confidence that it will belong to I.
Remark: opposite to frequentist statistics, one may use bayesian statistics, which assumes that the parametre θ is a random variable, with a probability distribution to be discovered. In this case one speaks about credible regions (probabilities) and not confidence intervals (confidence).
Did you like this example?
Subscribe for all access

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

You might be interested in

asked 2022-11-05
Getting confused over a T-test
its been a while since I've done this and I am getting rather confused.
Let's say I have two data sets of size n 1 and n 2
X = X 1 , X 2 , . . X n 1
Y = Y 1 , Y 2 , . . Y n 2
and want to construct a t-test. I have seen this formula in lots of books, what is the intuition and is this correct for finding a t-test?
T = X ¯ Y ¯ s 1 2 n 1 + s 2 2 n 2
I am getting confused a little, I see some books telling me to be careful is my variances are equal. Can i not just put them into this formula either way?
Also are we always referring to sample population, mean and standard deviation. How does the formula change if we have a the population data?
How does this formula change if the data is paired?
asked 2022-11-13
Standard Error is of Population Total
We have the following data and we are required to obtain the standard error of unbiased estimate of the population total:
N = 160 , n = 64 , σ 2 = 4
My approach
We know that:
S E ( X ¯ ) = σ n . So, it can be written as:
S E ( T o t a l n ) = σ n
Which in turn will be equal to:
S E ( T o t a l ) = ( σ ) ( n ).
In the above formula, after plugging in values, I am getting S E = ( 2 ) ( 8 ) = 16
But this is not correct. The correct answer is 40. Am I doing it incorrectly? I am not sure.
Any help?
asked 2022-11-04
Finding the ( X X ¯ ) 2 of first 5 data of dataset given mean and population variance
Mean and population variance of the dataset x 1 , x 2 . . x 10 are 19 and 49 respectively. If the value i = 6 10 x i 2 = 1900, what is the value of i = 1 5 x i 2 = ?.
I've solved it as following and it is wrong:
Population variance:
S 2 = ( x i x ¯ ) 2 n 49 = i = 1 5 x i 2 10 + 1900 10 49 = i = 1 5 x i 2 10 + 190 190 + 49 = i = 1 5 x i 2 10 i = 1 5 x i 2 = 141 10 = 1410
And this solution is wrong. How to solve this problem?
asked 2022-11-03
Bayesian Statistics - Basic question about prior
I try to get an understanding of bayesian statistics. My intuition tells me that in the expression for the posterior
p ( ϑ | x ) = p ( x | ϑ ) p ( ϑ ) Θ p ( x | θ ) p ( θ ) d θ
the term p ( ϑ ) is the marginal distribution of the likelihood-function p ( ϑ , x ). It is obtained by
p ( ϑ ) = X p ( ϑ | x ) p X ( x ) d x
where p X ( x ) should be the marginal distribution of the Observable data. Does that make sense?
To this point it makes sense with this example: Offering somebody a car insurance without knowing the person's style of driving (determined by ϑ Θ) to feed some statistical model, we still can make use of the nation's car-crash statistics as our prior, which is a pdf on Θ. That would be the marginal distribution of the "driving styles" across the population.
Maybe I am just oversimplifying here, because my resources did not mention this.
asked 2022-11-07
Analytical solution to mixed distribution fit to failure time data - Lambert W perhaps?
I have a set of n device failure times { t i > 0 } for i = 1... n and N n devices which have not yet failed. Using maximum likelihood I am attempting to find a closed-form analytical solution to fit the data to the following cumulative distribution function:
F ( t | λ , p ) = p ( 1 e λ t )
where 0 < p < 1 is the asymptotic fraction of units to eventually fail and λ > 0 the sub-population failure rate. The likelihood for this MLE attempt is given by:
L = ( 1 F ( t n ) ) N n i = 1 n f ( t i )
and
ln L = ( N n ) ln ( 1 p + p e λ t n ) + n λ p λ i = 1 n t i
with pdf of f ( t ) = d F / d t = λ p e λ t . Here we take λ p L = 0 or λ p ln L = 0 to solve for p and λ at max likelihood (or log likelihood). I've just recently learned a smidgen about the Lambert W function and was hoping that someone with a more nimble mind than mine might be able to derive a closed form solution using this and/or other cleverness.
asked 2022-10-30
Detecting corrupted data in birthdates of a population
I have a population of N birthdates. Let's assume that birthdates are uniformly distributed over the year.
I'm concerned that some of these records have been corrupted, for example by someone pasting over filtered rows in excel, or otherwise introduced by error.
I would like a test to identify those records in N that share a birthdate which is over-represented in the data, indicating that they may have false dates. Any record might have been corrupted with any date, but I'm assuming the nature of the corruption was to overwrite the dates on a bunch of records with a single (false) date.
If I count the number of records on each date, what is the number above which I should suspect that some of the dates on those records are false? Obviously random variation means that the counts of records will not be N/365 for each date, but how much higher does it need to be on any given date for me to be 95% confident that I'm not just just seeing random variation?
asked 2022-11-02
Using a "population" consisting of probabilities to predict accuracy of sample
Since I'm not sure if the title explains my question well enough I've come up with an example myself:
Let's say I live in a country where every citizen goes to work everyday and every citizen has the choice to go by bus or by train (every citizen makes this choice everyday again - there are almost no citizens who always go by train and never by bus, and vice-versa).
I've done a lot of sampling and I have data on one million citizens about their behaviour in the past 1000 days. So, I calculate the "probability" per citizen of going by train on a single day. I can also calculate the average of those calculated probabilities of all citizens, let's say the average probability of a citizen going by train is 0.27. I figured that most citizens have tendencies around this number (most citizens have an individual probability between 0.22 and 0.32 of going by train for example).
Now, I started sampling an unknown person (but known to be living in the same country) and after asking him 10 consecutive days whether he went by train or by bus, I know that this person went to his work by train 4 times, and by bus 6 times.
My final question: how can I use my (accurate) data on one million citizens to approximate this person's probability of going by train?
I know that if I do the calculation the other way around, so, calculate the probability of this event occurring given the fact that I know this person's REAL probability is 0.4 this results in: 0.4 4 0.6 6 10 C 4 =∼ 25 %. I could calculate this probability for all possible probabilities between 0.00 and 1.00 (so, 0 % 100 % without any numbers in between) and sum them all, which sums to about 910%. I could set this to 100% (dividing by 9.1) and set all other percentages accordingly (dividing everything by 9.1 - so, our 25% becomes ~2.75%) and come up with a weighted sum: 2.75 % 0.4 + X % 0.41 etc., but this must be wrong since I'm not taking my accurate samples of the population into account.

New questions