# The Wall Street Journal reported that the age at first startup for 55% of entrepreneurs was 29

Question
Sampling distributions

The Wall Street Journal reported that the age at first startup for $$55\%$$ of entrepreneurs was 29 years of age or less and the age at first startup for $$45\%$$ of entrepreneurs was 30 years of age or more.
a. Suppose a sample of 200 entrepreneurs will be taken to learn about the most important qualities of entrepreneurs. Show the sampling distribution of $$\overline{p}$$ where $$\overline{p}$$ is the sample proportion of entrepreneurs whose first startup was at 29 years of age or less.
b. Suppose a sample of 200 entrepreneurs will be taken to learn about the most important qualities of entrepreneurs. Show the sampling distribution of $$\overline{p}$$ where $$\overline{p}$$ is now the sample proportion of entrepreneurs whose first startup was at 30 years of age or more.
c. Are the standard errors of the sampling distributions different in parts (a) and (b)?

2021-01-11
It is given that the proportion of entrepreneurs whose first startup was at 29 years or less is $$p = 0.55$$ and the sample size $$n = 200$$
The sampling distribution of the proportion is approximately normal if $$np \Rightarrow 5\ and\ n(1 — p) \Rightarrow 5$$.
Verify the conditions:
$$np = 200 \times 0.55$$
$$= 110\Rightarrow 5$$
$$n(1 — p) = 200 \times (1 — 0.55)$$
And
$$=90 \Rightarrow 5$$
The conditions are satisfied. Therefore, the sampling distribution of the proportion is normal.
The mean of the $$\overline{p}\ is\ E(\overline{p})) = p$$ and standard deviation of $$\overline{p}\ is\ \sigma_{\overline{p}} = \sqrt{p(1-p)}/n$$
In this context, bar p is the sample proportion of entrepreneurs whose first startup was at 29 years or less
The mean of $$\overline{p}$$ is
$$E \overline{p})=p= 0.55$$
The standard deviation of $$\overline{p}$$ is
$$\sigma_{p} = \frac{\sqrt{p(1-p)}}{n}=\frac{\sqrt{0.55\times 0.45}}{200}= 0.0352$$
Thus, the sampling distribution of the proportion $$\overline{p}$$ proportion of entrepreneurs whose first startup was at 29 years or less is normal with mean $$E(\overline{p}) = 0.55$$ and standard deviation $$\sigma_{p} = 0.0352$$

### Relevant Questions

At what age do babies learn to crawl? Does it take longer to learn in the winter when babies are often bundled in clothes that restrict their movement? Data were collected from parents who brought their babies into the University of Denver Infant Study Center to participate in one of a number of experiments between 1988 and 1991. Parents reported the birth month and the age at which their child was first able to creep or crawl a distance of 4 feet within 1 minute. The resulting data were grouped by month of birth: January, May, and September:

$$\begin{array}{c} & Crawling\ age \\ \hline Birth\ month & Mean & St.dev. & n \\ \hline January & 29.84 & 7.08 & 32 \\ May & 28.58 & 8.07 & 27 \\ September & 33.83 & 6.93 & 38\end{array}$$

Crawling age is given in weeks. Assume the data represent three independent simple random samples, one from each of the three populations consisting of babies born in that particular month, and that the populations of crawling ages have Normal distributions. A partial ANOVA table is given below

. $$\begin{array}{c}Source & Sum\ of\ squares & DF & Mean\ square\ F \\ \hline Groups & 505.26\\ Error & & &53.45\\ Total\end{array}$$

What are the degrees of freedom for the groups term?

Using the health records of ever student at a high school, the school nurse created a scatterplot relating y = height (in centimeters) to x = age (in years).
After verifying that the conditions for the regression model were met, the nurse calculated the equation of the population regression line to be $$\displaystyle\mu_{{0}}={105}+{4.2}{x}\ \text{with}\ \sigma={7}{c}{m}$$.
About what percent of 15-year-old students at this school are taller than 180 cm?

The article “Stochastic Modeling for Pavement Warranty Cost Estimation” (J. of Constr. Engr. and Mgmnt., 2009: 352–359) proposes the following model for the distribution of Y = time to pavement failure. Let $$\displaystyle{X}_{{{1}}}$$ be the time to failure due to rutting, and $$\displaystyle{X}_{{{2}}}$$ be the time to failure due to transverse cracking, these two rvs are assumed independent. Then $$\displaystyle{Y}=\min{\left({X}_{{{1}}},{X}_{{{2}}}\right)}$$. The probability of failure due to either one of these distress modes is assumed to be an increasing function of time t. After making certain distributional assumptions, the following form of the cdf for each mode is obtained: $$\displaystyle\Phi{\left[\frac{{{a}+{b}{t}}}{{\left({c}+{\left.{d}{t}\right.}+{e}{t}^{{{2}}}\right)}^{{\frac{{1}}{{2}}}}}\right]}$$ where $$\Uparrow \Phi$$ is the standard normal cdf. Values of the five parameters a, b, c, d, and e are -25.49, 1.15, 4.45, -1.78, and .171 for cracking and -21.27, .0325, .972, -.00028, and .00022 for rutting. Determine the probability of pavement failure within $$\displaystyle{t}={5}$$ years and also $$\displaystyle{t}={10}$$ years.

A random sample of $$\displaystyle{n}_{{1}}={16}$$ communities in western Kansas gave the following information for people under 25 years of age.
$$\displaystyle{X}_{{1}}:$$ Rate of hay fever per 1000 population for people under 25
$$\begin{array}{|c|c|} \hline 97 & 91 & 121 & 129 & 94 & 123 & 112 &93\\ \hline 125 & 95 & 125 & 117 & 97 & 122 & 127 & 88 \\ \hline \end{array}$$
A random sample of $$\displaystyle{n}_{{2}}={14}$$ regions in western Kansas gave the following information for people over 50 years old.
$$\displaystyle{X}_{{2}}:$$ Rate of hay fever per 1000 population for people over 50
$$\begin{array}{|c|c|} \hline 94 & 109 & 99 & 95 & 113 & 88 & 110\\ \hline 79 & 115 & 100 & 89 & 114 & 85 & 96\\ \hline \end{array}$$
(i) Use a calculator to calculate $$\displaystyle\overline{{x}}_{{1}},{s}_{{1}},\overline{{x}}_{{2}},{\quad\text{and}\quad}{s}_{{2}}.$$ (Round your answers to two decimal places.)
(ii) Assume that the hay fever rate in each age group has an approximately normal distribution. Do the data indicate that the age group over 50 has a lower rate of hay fever? Use $$\displaystyle\alpha={0.05}.$$
(a) What is the level of significance?
State the null and alternate hypotheses.
$$\displaystyle{H}_{{0}}:\mu_{{1}}=\mu_{{2}},{H}_{{1}}:\mu_{{1}}<\mu_{{2}}$$
$$\displaystyle{H}_{{0}}:\mu_{{1}}=\mu_{{2}},{H}_{{1}}:\mu_{{1}}>\mu_{{2}}$$
$$\displaystyle{H}_{{0}}:\mu_{{1}}=\mu_{{2}},{H}_{{1}}:\mu_{{1}}\ne\mu_{{2}}$$
$$\displaystyle{H}_{{0}}:\mu_{{1}}>\mu_{{2}},{H}_{{1}}:\mu_{{1}}=\mu_{{12}}$$
(b) What sampling distribution will you use? What assumptions are you making?
The standard normal. We assume that both population distributions are approximately normal with known standard deviations.
The Student's t. We assume that both population distributions are approximately normal with unknown standard deviations,
The standard normal. We assume that both population distributions are approximately normal with unknown standard deviations,
The Student's t. We assume that both population distributions are approximately normal with known standard deviations,
What is the value of the sample test statistic? (Test the difference $$\displaystyle\mu_{{1}}-\mu_{{2}}$$. Round your answer to three decimalplaces.)

What is the value of the sample test statistic? (Test the difference $$\displaystyle\mu_{{1}}-\mu_{{2}}$$. Round your answer to three decimal places.)
(c) Find (or estimate) the P-value.
P-value $$\displaystyle>{0.250}$$
$$\displaystyle{0.125}<{P}-\text{value}<{0},{250}$$
$$\displaystyle{0},{050}<{P}-\text{value}<{0},{125}$$
$$\displaystyle{0},{025}<{P}-\text{value}<{0},{050}$$
$$\displaystyle{0},{005}<{P}-\text{value}<{0},{025}$$
P-value $$\displaystyle<{0.005}$$
Sketch the sampling distribution and show the area corresponding to the P-value.
P.vaiue Pevgiue
P-value f P-value

The tables show the battery lives (in hours) of two brands of laptops. a) Make a double box-and-whisker plot that represent's the data. b) Identifity the shape of each distribution. c) Which brand's battery lives are more spread out? Explain. d) Compare the distributions using their shapes and appropriate measures of center and variation.

The table shows the temperatures T (in degrees Fahrenheit) at which water boils at selected pressures p (in pounds per square inch). A model that approximates the datais: $$\displaystyle{T}={87.97}\ +\ {34.96}\ \text{In}\ {p}\ +\ {7.91}\ \sqrt{{{p}}}$$ a) Use a graphing untility to plot the data and graph the model in the same veiwing window. How well does the model fit the data? b) Use the graph to estimate the pressure at which the boiling point of water is  $$300^{\circ}$$ F. c) Calculate T when the pressure is 74 pounds per square inch. Verify your answer graphically.

Assume that the random variable Z follows standard normal distribution, calculate the following probabilities (Round to two decimal places)
a)$$P(z>1.9)$$
b)$$\displaystyle{P}{\left(−{2}\le{z}\le{1.2}\right)}$$
c)$$P(z\geq0.2)$$

Which possible statements about the chi-squared distribution are true?
a) The statistic $$X^{2}$$, that is used to estimate the variance $$S^{2}$$ of a random sample, has a Chi-squared distribution.
b) The sum of the squares of k independent standard normal random variables has a Chi-squared distribution with k degrees of freedom.
c) The Chi-squared distribution is used in hypothesis testing and estimation.
d) The Chi-squared distribution is a particular case of the Gamma distribution.
e)All of the above.

The following table lists the reported number of cases of infants born in the United States with HIV in recent years because their mother was infected.
Source:
Centers for Disease Control and Prevention.
$$\begin{array}{|c|c|}\hline \text{Year} & \text{Cases} \\ \hline 1995 & 295 \\ \hline 1997 & 166 \\ \hline 1999 & 109 \\ \hline 2001 & 115 \\ \hline 2003 & 94 \\ \hline 2005 & 107 \\ \hline 2007 & 79 \\ \hline \end{array}$$
a) Plot the data on a graphing calculator, letting $$\displaystyle{t}={0}$$ correspond to the year 1995.
b) Using the regression feature on your calculator, find a quadratic, a cubic, and an exponential function that models this data.
c) Plot the three functions with the data on the same coordinate axes. Which function or functions best capture the behavior of the data over the years plotted?
d) Find the number of cases predicted by all three functions for 20152015. Which of these are realistic? Explain.

An analysis of laboratory data collected with the goal of modeling the weight (in grams) of a bacterial culture after several hours of growth produced the least squares regression line $$\log(weight) = 0.25 + 0.61$$hours. Estimate the weight of the culture after 3 hours.

A) 0.32 g

B) 2.08 g

C) 8.0 g

D) 67.9 g

E) 120.2 g

...