Estimating Gaussian parameters of a set of data points. I have a set of data points. When I draw a histogram of them, plotting their frequency of occurrence against them, I get a curve that looks like a normal curve. I am also able to perform test on the data set to know whether it follows a normal distribution or more precisely whether the population it comes from follows a normal probability distribution. I am using Shapiro Wilk test for it.

klesstilne1 2022-11-18 Answered
Estimating Gaussian parameters of a set of data points
I have a set of data points. When I draw a histogram of them, plotting their frequency of occurrence against them, I get a curve that looks like a normal curve. I am also able to perform test on the data set to know whether it follows a normal distribution or more precisely whether the population it comes from follows a normal probability distribution. I am using Shapiro Wilk test for it.
However, how can I know what the equation of that normal curve will be? Moreover, is there a way I can test whether other standard distributions fit the points more accurately, and estimate their parameters?
You can still ask an expert for help

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

Solve your problem for the price of one coffee

  • Available 24/7
  • Math expert for every subject
  • Pay only if we can solve it
Ask Question

Answers (1)

metodikkf6z
Answered 2022-11-19 Author has 14 answers
Step 1
You can estimate the parameters μ and σ by using the statistics:
μ ^ = X ¯ = 1 n X i
and
σ ^ 2 = 1 n 1 ( X i X ¯ ) 2
Step 2
Where X i would be the ith sample element. Thus X ¯ is the sample mean. So the equation of the fitted distribution would be:
f ( x ) = 1 2 π σ ^ 2 e ( x μ ^ ) 2 2 σ ^ 2
You can use the Pearson Chi Squared test to check the hypothesis that the data comes from the distribution being tested.
Did you like this example?
Subscribe for all access

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

You might be interested in

asked 2022-11-10
A random sample of 500 is taken from a large population, which is known to be equally divided between males and females, and values for the quantity of interest are recorded. On examination of the results, it is found that the sample taken includes 200 females for which the mean is 10.2 with standard deviation of 0.6 and 300 males for which the mean is 14.8 and standard deviation 2.4.
Question:
Which one of the following statements in NOT correct?
1. Taking the mean value of the data for the 500 sampled would over-estimate the true population mean
2. The most accurate estimate of the mean of the quantity of interest would have been obtained by sampling equal numbers of males and females
3. Given the sample that was taken, the best estimate of the population mean is the average of the means of the males and females i.e. 12.5
4. The estimate from the sample of the mean for females is likely to be more accurate than that for males
My Attempt:
My guess is that option 4) is incorrect because your told in the beginning the population is known to be divided equally?
asked 2022-11-19
We have data come from a normally distributed population with standard deviation 2.6, what sample size is needed to make sure that with 99% probability, the mean of the sample will be in error by at most 0.25?
asked 2022-11-18
According to frequentists, why can't probabilistic statements be made about population paramemters?
The confidence interval can be expressed in terms of samples (or repeated samples): "Were this procedure to be repeated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true population parameter 90% of the time."[1] Note that this does not refer to repeated measurement of the same sample, but repeated sampling.
And:
The confidence interval can be expressed in terms of a single sample: "There is a 90% probability that the calculated confidence interval from some future experiment encompasses the true value of the population parameter." Note this is a probability statement about the confidence interval, not the population parameter.
And:
A 95% confidence interval does not mean that for a given realised interval calculated from sample data there is a 95% probability the population parameter lies within the interval, nor that there is a 95% probability that the interval covers the population parameter.[11] Once an experiment is done and an interval calculated, this interval either covers the parameter value or it does not; it is no longer a matter of probability.
asked 2022-11-15
A coach has made a statement that his players have bigger lung capacity than the average of the population of the same age which is 3.4. (Normal distribution)
The measurements yield the following data: 3.4, 3.6, 3.8, 3.3, 3.4, 3.5, 3.7, 3.6, 3.7, 3.4 and 3.6.
n = 11
X ¯ = 3.545
S = 0.157
Find the required sample size, which lung capacity should be measured, so coach can state his statement with 99% confidence. (assume σ 2 = 0.09)
I don't even know how should I start. My initial thought was to use the U statistics U = X ¯ μ σ n ~ N ( 0 , 1 ). But I don't know the U.
asked 2022-11-07
Analytical solution to mixed distribution fit to failure time data - Lambert W perhaps?
I have a set of n device failure times { t i > 0 } for i = 1... n and N n devices which have not yet failed. Using maximum likelihood I am attempting to find a closed-form analytical solution to fit the data to the following cumulative distribution function:
F ( t | λ , p ) = p ( 1 e λ t )
where 0 < p < 1 is the asymptotic fraction of units to eventually fail and λ > 0 the sub-population failure rate. The likelihood for this MLE attempt is given by:
L = ( 1 F ( t n ) ) N n i = 1 n f ( t i )
and
ln L = ( N n ) ln ( 1 p + p e λ t n ) + n λ p λ i = 1 n t i
with pdf of f ( t ) = d F / d t = λ p e λ t . Here we take λ p L = 0 or λ p ln L = 0 to solve for p and λ at max likelihood (or log likelihood). I've just recently learned a smidgen about the Lambert W function and was hoping that someone with a more nimble mind than mine might be able to derive a closed form solution using this and/or other cleverness.
asked 2022-11-14
I am fond of astronomy and environment. I want to try to make a "light pollution map" but I haven't my satellites... so I use as approximation of light pollution the cities' population. Let say we have for each city C citizens, each one spreads an average of X Watt of electricity for lightning ( I have these data ). Skip the units ( I need just a rough dimensionless "light power" ): city city light power = C × X
I have a map, with many cities. I know light power is inversely proportional to the square of distance. I don't know about sky, air diffraction, cloud reflections.
Start from the simplest model. A flat terrain map. N light sources, every one at position X(n), Y(n) has a specific "total light power" = C ( n ) × X ( n )
At a specific point of coordinates (x,y) which is the light power, sum of all the cities light ?
I tried to calculate and plot, but it seems weird ( too far from some real satellite night shot ) and too slow to calculate.
asked 2022-11-07
How to find the initial and the future population based on today's data?
A certain species of bird was introduced in a certain county 25 years ago. Biologists observe that the population doubles every 10 years, and now the population is 27,000.
(A) - What was the initial size of the bird population? (Round your answer to the nearest whole number.)
n   (initial) = 27 , 000 2 ( 25 / 10 ) [ n ]   (initial) = 4773 - correct.
(B) - Estimate the bird population 8 years from now. (Round your answer to the nearest whole number.)
n   (8 years later) = 4773 × 2 ( 8 / 10 ) [ n ]   (8 years later) = 8310 - wrong.

New questions