Step 1
Standard normal distribution:
The standard normal distribution is a special case of normal distribution, in which the mean of the distribution is 0 and standard deviation of the distribution is 1. The Z-scores for the given sample size can be calculated using the given formula.
where, x is the normal random variable, n is the sample size, is the mean, and is the standard deviation
Step 2
The procedure for obtaining the percentage of all the possible observations that lie within the specified range is as follows:
-Sketch the normal curve associated with the variable.
-Shade the region of interest and mark the delimiting x-values.
-Compute the z-scores for the x-
-Find the area under the standard normal curve for the computed z-scores using standard normal table.
Several tests of normality exist, using which you can verify whether a particular data follows the normal distribution.
Usually, before conducting a formal test, we prefer to take the help of graphical methods, to see if the data may be assumed to follow the normal distribution, at least approximately. A few such graphical methods are:
-Histogram of the data , superimposed with a normal probability curve,
-Normal probability plot with confidence interval,
-Normal quantile-quantile (QQ) plot.
-Boxplot, etc.
Step 3
If the graphical display appears to show at least an approximate normal distribution, then a formal test can be used to verify the normality. A few such tests are as follows:
-Pearson’s Chi-squared test for goodness of fit,
-Shapiro-Wilk test,
-Kolmogorov-Smirnov test, etc.
The Pearson’s Chi-squared test is discussed here.
Pearson’s Chi-squared test for goodness of fit:
Suppose the data set can be divided into n categories or classes, with observed frequency in the class as and expected frequency in the class as (i = 1, 2, …, n). Further, assume that the data is obtained from a simple random sampling, the total sample size is large, each cell count (for each category) is at least 5 and the observations are independent.
Then, the degrees of freedom, df = (number of categories) – (number of parameters in the model) – 1. For n categories in the data set and 2 parameters (mean and variance) of the normal distribution, df = n – 3.
The test statistic for the test is given as, ,where the summation is done over all i = 1, 2, …, n.
The observed frequencies will be known from the data set. The expected frequencies for a normal distribution can be obtained by multiplying the total sample size, say, N, by the normal probability for the corresponding class (obtained from a standard normal table or any software such as, EXCEL, MINITAB, etc.).
The corresponding p-value for the test can be used to check whether the data follows normal distribution or not.
Assumptions of normality:
The assumptions of normality are as follows:
-The data should be symmetric.
-The data should be mesokurtic.
-The empirical rule must be satisfied.
Empirical rule:
-68% of values fall within one standard deviation from the mean.
-95% of values fall within two standard deviations from the mean.
-7% of values fall within three standard deviations from the mean.