# GastroenterologyWe present data relating protein concentration to pancreatic function as measured by trypsin secretion among patients with cystic fibrosis.

Question
Modeling data distributions

Gastroenterology
We present data relating protein concentration to pancreatic function as measured by trypsin secretion among patients with cystic fibrosis.
If we do not want to assume normality for these distributions, then what statistical procedure can be used to compare the three groups?
Perform the test mentioned in Problem 12.42 and report a p-value. How do your results compare with a parametric analysis of the data?
Relationship between protein concentration $$(mg/mL)$$ of duodenal secretions to pancreatic function as measured by trypsin secretion:
$$[U/ \frac{kg}{hr}]$$
Tapsin secreton [UGA]
$$\leq\ 50$$
$$\begin{array}{|c|c|}\hline \text{Subject number} & \text{Protetion concentration} \\ \hline 1 & 1.7 \\ \hline 2 & 2.0 \\ \hline 3 & 2.0 \\ \hline 4 & 2.2 \\ \hline 5 & 4.0 \\ \hline 6 & 4.0 \\ \hline 7 & 5.0 \\ \hline 8 & 6.7 \\ \hline 9 & 7.8 \\ \hline \end{array}$$
$$51\ -\ 1000$$
$$\begin{array}{|c|c|}\hline \text{Subject number} & \text{Protetion concentration} \\ \hline 1 & 1.4 \\ \hline 2 & 2.4 \\ \hline 3 & 2.4 \\ \hline 4 & 3.3 \\ \hline 5 & 4.4 \\ \hline 6 & 4.7 \\ \hline 7 & 6.7 \\ \hline 8 & 7.9 \\ \hline 9 & 9.5 \\ \hline 10 & 11.7 \\ \hline \end{array}$$
$$>\ 1000$$
$$\begin{array}{|c|c|}\hline \text{Subject number} & \text{Protetion concentration} \\ \hline 1 & 2.9 \\ \hline 2 & 3.8 \\ \hline 3 & 4.4 \\ \hline 4 & 4.7 \\ \hline 5 & 5.5 \\ \hline 6 & 5.6 \\ \hline 7 & 7.4 \\ \hline 8 & 9.4 \\ \hline 9 & 10.3 \\ \hline \end{array}$$

2020-12-03
Step 1 Given data is:
Tapsin secreton $$[U/(kg/hr)]$$
$$\leq\ 50$$
$$\begin{array}{|c|c|}\hline \text{Subject number} & \text{Protetion concentration} \\ \hline 1 & 1.7 \\ \hline 2 & 2.0 \\ \hline 3 & 2.0 \\ \hline 4 & 2.2 \\ \hline 5 & 4.0 \\ \hline 6 & 4.0 \\ \hline 7 & 5.0 \\ \hline 8 & 6.7 \\ \hline 9 & 7.8 \\ \hline \end{array}$$
$$51\ -\ 1000$$
$$\begin{array}{|c|c|}\hline \text{Subject number} & \text{Protetion concentration} \\ \hline 1 & 1.4 \\ \hline 2 & 2.4 \\ \hline 3 & 2.4 \\ \hline 4 & 3.3 \\ \hline 5 & 4.4 \\ \hline 6 & 4.7 \\ \hline 7 & 6.7 \\ \hline 8 & 7.9 \\ \hline 9 & 9.5 \\ \hline 10 & 11.7 \\ \hline \end{array}$$
$$>\ 1000$$
$$\begin{array}{|c|c|}\hline \text{Subject number} & \text{Protetion concentration} \\ \hline 1 & 2.9 \\ \hline 2 & 3.8 \\ \hline 3 & 4.4 \\ \hline 4 & 4.7 \\ \hline 5 & 5.5 \\ \hline 6 & 5.6 \\ \hline 7 & 7.4 \\ \hline 8 & 9.4 \\ \hline 9 & 10.3 \\ \hline \end{array}$$
Step 2
1) By using Kruskal-Wallis test to compare 3 groups we get, Combining score of all the three groups, arranging them into ascending order and assigning them ra
.
$$A=group\ 1\ \leq\ 50$$
$$B=group\ 2\ =51\ -\ 1000$$
$$C=group\ 3\ =\ \Rightarrow\ 1000$$
$$\begin{array}{|c|c|}\hline \text{Observation} & \text{Rank} & \text{Groups} \\ \hline 1.4 & 1 & B \\ \hline 1.7 & 2 & A \\ \hline 2 & 3.5 & A \\ \hline 2 & 3.5 & A \\ \hline 2.2 & 5 & A \\ \hline 2.4 & 6.5 & B \\ \hline 2.4 & 6.5 & B \\ \hline 2.9 & 8 & C \\ \hline 3.3 & 9 & B \\ \hline 3.8 & 10 & C \\ \hline 4 & 11.5 & A \\ \hline 4 & 11.5 & A \\ \hline 4.4 & 13.5 & B \\ \hline 4.4 & 13.5 & C \\ \hline 4.7 & 15.5 & B \\ \hline 4.7 & 15.5 & C \\ \hline 5 & 17.5 & A \\ \hline 5 & 17.5 & C \\ \hline 5.6 & 19 & C \\ \hline 6.7 & 20.5 & A \\ \hline 6.7 & 20.5 & B \\ \hline 7.4 & 22 & C \\ \hline 7.6 & 23 & B \\ \hline 7.8 & 24 & A \\ \hline 9.4 & 25 & C \\ \hline 9.5 & 26 & B \\ \hline 10.3 & 27 & C \\ \hline 11.7 & 28 & B \\ \hline \end{array}$$
$$n_{A} = 9$$
$$n_{B} = 10$$
$$n_{C} = 9$$
$$n = n_{A}\ +\ n_{B}\ +\ n_{C} = 9\ +\ 10\ +\ 9 = 28$$
$$R_{A} = \sum\ \text{of ra for group}\ A = 2\ +\ 3.5\ +\ 3.5\ +\ 5\ +\ 11.5\ +\ 11.5\ +\ 17.5\ +\ 20.5\ +\ 24 = 99$$

$$R_{B} = \sum \text{of ra for group} B = 1\ +\ 6.5\ +\ 6.5\ +\ 9\ +\ 13.5\ +\ 15.5\ +\ 20.5\ +\ 23\ +\ 26\ +\ 28 = 149.5$$

$$R_{C} = \sum \text{of ra for group} C = 8\ +\ 10\ +\ 13.5\ +\ 15.5\ +\ 17.5\ +\ 19\ +\ 22\ +\ 25\ +\ 27 = 157.5$$

Hypothesis is given as:
$$H_{0}:\ \mu_{A}=\ \mu_{b}=\ \mu_{C}$$ i.e. three groups are equally effective.
$$H_{1} :\ \text{at least two of the} \mu$$ are different.
Kruskal-Wallis test statistics is given as:
$$H=\ \frac{12}{n(n\ +\ 1)}\left[\frac{R_{A}^{2}}{n_{A}}\ +\ \frac{R_{B}^{2}}{n_{B}}\ +\ \frac{R_{C}^{2}}{n_{C}}\right]-3(n\ +\ 1)$$
$$=\ \frac{12}{28\ \times\ (29)}\left[\frac{(99)^{2}}{9}\ +\ \frac{(145.5)^{2}}{10}\ +\ \frac{157.5^{2}}{9}\right]\ -\ 87$$
$$=\ \frac{12}{812}\left[\frac{9810}{9}\ +\ \frac{22350.25}{10}\ +\ \frac{24806.25}{9}\right]\ -\ 87$$
$$=\ \frac{12}{812}\left[1089\ +\ 2235.025\ +\ 2756.25\right]\ -\ 87$$
$$=\ \frac{12}{812}\left[6080.275\right]\ -\ 87$$
$$=0.01479\ \times\ [6080.275]\ -\ 87$$
$$= 89.927\ -\ 87$$
$$H = 2.927$$
$$df = k\ -\ 1 = 3\ -\ 1 = 2$$
The table of chi — square for 2 d.f. at 3% level of significance is = 5.991
The calculated value $$H = 2.921$$ is smaller than table value
Conclusion : Accept $$H_{0}$$. i.e. three groups are equally effective.
Step 3
Compare result with a parametric analysis of the data is given as:
By using excel we get anova:
Summary:
$$\begin{array}{|c|c|}\hline \text{Groups} & \text{Count} & \text{Sum} & \text{Average} & \text{Variance} \\ \hline A & 9 & 35.4 & 3.933333 & 4.9025 \\ \hline B & 10 & 54.1 & 5.41 & 11.43656 \\ \hline C & 9 & 53.5 & 5.944444 & 6.480278 \\ \hline \end{array}$$
ANOVA:
$$\begin{array}{|c|c|}\hline \text{Source of Variation} & \text{SS} & \text{df} & \text{MS} & \text{F} & \text{P-value} & \text{F-crit} \\ \hline \text{Between Groups} & 19.62735 & 2 & 9.813675 & 1.264706 & 0.29977 & 3.38519 \\ \hline \text{Within Groups} & 193.9912 & 25 & 7.59649\\ \hline \text{Total} & 23.6186 & 27 \\ \hline \end{array}$$
$$F\ -\ calculated\ value\ = 1.264706\ is\ less\ than\ F\ -\ table(critical\ value) = 3.38519.$$
Conclusion: Accept $$H_{0}.$$ i.e. three groups are equally effective.
By comparing non parametric Kruskal-Wallis test to parametric analysis of data both test have same result.

### Relevant Questions

1)A rewiew of voted registration record in a small town yielded the dollowing data of the number of males and females registered as Democrat, Republican, or some other affilation:

$$\begin{array}{c} Gender \\ \hline Affilation & Male & Female \\ \hline Democrat & 300 & 600 \\ Republican & 500 & 300 \\ Other & 200 & 100 \\ \hline \end{array}$$

What proportion of all voters is male and registered as a Democrat? 2)A survey was conducted invocted involving 303 subject concerning their preferences with respect to the size of car thay would consider purchasing. The following table shows the count of the responses by gender of the respondents:

$$\begin{array}{c} Size\ of\ Car \\ \hline Gender & Small & Medium & lange & Total \\ \hline Female & 58 & 63 & 17 & 138 \\ Male & 79 & 61 & 25 & 165 \\ Total & 137 & 124 & 42 & 303 \\ \hline \end{array}$$

the data are to be summarized by constructing marginal distributions. In the marginal distributio for car size, the entry for mediums car is ?

An analysis of laboratory data collected with the goal of modeling the weight (in grams) of a bacterial culture after several hours of growth produced the least squares regression line $$\log(weight) = 0.25 + 0.61$$hours. Estimate the weight of the culture after 3 hours.

A) 0.32 g

B) 2.08 g

C) 8.0 g

D) 67.9 g

E) 120.2 g

An experiment designed to study the relationship between hypertension and cigarette smoking yielded the following data.
$$\begin{array}{|c|c|} \hline Tension\ level & Non-smoker & Moderate\ smoker & Heavy\ smoker \\ \hline Hypertension & 20 & 38 & 28 \\ \hline No\ hypertension & 50 & 27 & 18 \\ \hline \end{array}$$
Test the hypothesis that whether or not an individual has hypertension is independent of how much that person smokes.

The tables show the battery lives (in hours) of two brands of laptops. a) Make a double box-and-whisker plot that represent's the data. b) Identifity the shape of each distribution. c) Which brand's battery lives are more spread out? Explain. d) Compare the distributions using their shapes and appropriate measures of center and variation.

The following quadratic function in general form, $$\displaystyle{S}{\left({t}\right)}={5.8}{t}^{2}—{81.2}{t}+{1200}$$ models the number of luxury home sales, S(t), in a major Canadian urban area, according to statistical data gathered over a 12 year period. Luxury home sales are defined in this market as sales of properties worth over \$3 Million (inflation adjusted). In this case, $$\displaystyle{\left\lbrace{t}\right\rbrace}={\left\lbrace{0}\right\rbrace}\ \text{represents}\ {2000}{\quad\text{and}\quad}{\left\lbrace{t}\right\rbrace}={\left\lbrace{11}\right\rbrace}$$represents 2011. Use a calculator to find the year when the smallest number of luxury home sales occurred. Without sketching the function, interpret the meaning of this function, on the given practical domain, in one well-expressed sentence.

The following table shows the average yearly tuition and required fees, in thousand of dollars, charged by a certain private university in the school year beginning in the given year.
$$\begin{array}{|c|c|}\hline \text{Year} & \text{Average tuition} \\ \hline 2005 & 17.6 \\ \hline 2007 & 18.1 \\ \hline 2009 & 19.5 \\ \hline 2011 & 20.7 \\ \hline 2013 & 21.8 \\ \hline \end{array}$$
What prediction does the formula modeling this data give for average yearly tuition and required fees for the university for the academic year beginning in 2019?

At what age do babies learn to crawl? Does it take longer to learn in the winter when babies are often bundled in clothes that restrict their movement? Data were collected from parents who brought their babies into the University of Denver Infant Study Center to participate in one of a number of experiments between 1988 and 1991. Parents reported the birth month and the age at which their child was first able to creep or crawl a distance of 4 feet within 1 minute. The resulting data were grouped by month of birth: January, May, and September:

$$\begin{array}{c} & Crawling\ age \\ \hline Birth\ month & Mean & St.dev. & n \\ \hline January & 29.84 & 7.08 & 32 \\ May & 28.58 & 8.07 & 27 \\ September & 33.83 & 6.93 & 38\end{array}$$

Crawling age is given in weeks. Assume the data represent three independent simple random samples, one from each of the three populations consisting of babies born in that particular month, and that the populations of crawling ages have Normal distributions. A partial ANOVA table is given below

. $$\begin{array}{c}Source & Sum\ of\ squares & DF & Mean\ square\ F \\ \hline Groups & 505.26\\ Error & & &53.45\\ Total\end{array}$$

What are the degrees of freedom for the groups term?

According to the article “Modeling and Predicting the Effects of Submerged Arc Weldment Process Parameters on Weldment Characteristics and Shape Profiles” (J. of Engr. Manuf., 2012: 1230–1240), the submerged arc welding (SAW) process is commonly used for joining thick plates and pipes. The heat affected zone (HAZ), a band created within the base metal during welding, was of particular interest to the investigators. Here are observations on depth (mm) of the HAZ both when the current setting was high and when it was lower. $$\begin{matrix} Non-high & 1.04 & 1.15 & 1.23 & 1.69 & 1.92 & 1.98 & 2.36 & 2.49 & 2.72 & 1.37 & 1.43 & 1.57 & 1.71 & 1.94 & 2.06 & 2.55 & 2.64 & 2.82 \\ High & 1.55 & 2.02 & 2.02 & 2.05 & 2.35 & 2.57 & 2.93 & 2.94 & 2.97 \\ \end{matrix}$$ c. Does it appear that true average HAZ depth is larger for the higher current condition than for the lower condition? Carry out a test of appropriate hypotheses using a significance level of .01.

In an experiment designed to study the effects of illumination level on task performance (“Performance of Complex Tasks Under Different Levels of Illumination,” J. Illuminating Eng., 1976: 235–242), subjects were required to insert a fine-tipped probe into the eyeholes of ten needles in rapid succession both for a low light level with a black background and a higher level with a white background. Each data value is the time (sec) required to complete the task.
$$\begin{array}{|c|c|} \hline Subject & (1) & (2) & (3) & (4) & (5) &(6) & (7) & (8) & (9) \\ \hline Black & 25.85 & 28.84 & 32.05 & 25.74 & 20.89 & 41.05 & 25.01 & 24.96 & 27.47 \\ \hline White & 18.28 & 20.84 & 22.96 & 19.68 & 19.509 & 24.98 & 16.61 & 16.07 & 24.59 \\ \hline \end{array}$$
Does the data indicate that the higher level of illumination yields a decrease of more than 5 sec in true average task completion time? Test the appropriate hypotheses using the P-value approach.

Use the table from the Theoretical Distribution section to calculate the following answers. Round your answers to four decimal places. $$P(x = 3)=?$$
$$P(1 < x < 4) = ?$$
$$P(x \geq 8) = ?$$ Use the data from the Organize the Data section to calculate the following answers. Round your answers to four decimal places. $$RF(x = 3) = ?$$
$$RF(1 < x < 4) =?$$
$$RF(x \geq 8) = ?$$ Discussion Questions 1. Knowing that data vary, describe three similarities between the graphs and distributions of the theoretical, empirical, and simulation distributions. Use complete sentences.