Recent questions in Population Data

Population Data
Answered

Baardegem3Gw
2022-11-24

I have a question which is about the number of weeks out of 5 in which an event occurs. I have a frequency table with a sample of 40 - with $x=0,1,2,3,4,5$ and freq, 2,7,11,12,6,2.

I have worked out the unbiased population mean and estimate - but then I'm not sure whether binomial what I need or not? I have to decide if a bionomial model is appropriate.

I can see that the data is discrete but its not binary like "event happens" or "event does not happen". It seems relatively symmetrical - and almost normally distributed? I'm not really sure how to work this out? Is a binomial model right or not?

Population Data
Answered

nazismes2w7
2022-11-23

If we have two normally distributed populations ${P}_{1}$ and ${P}_{2}$, then to test the hypothesis ${H}_{0}:{\mu}_{1}={\mu}_{2}$ against an alternative hypothesis, we choose the decision variable according to three cases:

Case 1: If ${\sigma}_{1}$ and ${\sigma}_{2}$ are known, then regardless whether ${n}_{1}$ and ${n}_{2}$ are large or small, we consider the standard normal statistic:

$Z=\frac{{\overline{X}}_{1}-{\overline{X}}_{2}}{\sqrt{\frac{{\sigma}_{1}^{2}}{{n}_{1}}+\frac{{\sigma}_{2}^{2}}{{n}_{2}}}}$

Case 2: If the $\sigma $'s are unknown, but ${n}_{1}$ and ${n}_{2}$ are large, then we consider:

$Z=\frac{{\overline{X}}_{1}-{\overline{X}}_{2}}{\sqrt{\frac{{S}_{{X}_{1}}^{2}}{{n}_{1}}+\frac{{S}_{{X}_{2}}^{2}}{{n}_{2}}}}$

Case 3: If the $\sigma $'s are unknown and ${n}_{1},{n}_{2}$ are small, then we consider the student-t statistic:

$T=\frac{{\overline{X}}_{1}-{\overline{X}}_{2}}{\hat{\sigma}\sqrt{\frac{1}{{n}_{1}}+\frac{1}{{n}_{2}}}}$

Where ${\hat{\sigma}}^{2}:=\frac{({n}_{1}-1){S}_{{X}_{1}}^{2}+({n}_{2}-1){S}_{{X}_{2}}^{2}}{{n}_{1}+{n}_{2}-2}$

Is this correct?

Population Data
Answered

NormmodulxEE
2022-11-22

Suppose that we are interested in learning the proportion of the population $\theta $ with a particular property (for instance, the fraction of the population who are male). Suppose that we randomly sample n members of this population (with replacement, to make things easier) and observe that y of them have the property (so the fraction of the sample with the property is y/n). We start with a continuous prior p($\theta $) with full support [0,1] and update this using Bayes rule.

Question: does the expected value of the posterior always lie between the prior expectation and the sample fraction y/n?

Population Data
Answered

SevcamXnr
2022-11-22

$f(x)=\{\begin{array}{c}\frac{1}{b-a}\text{if}x\in (a,b)\\ 0\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\text{else}\end{array}$

where interval limits a and b are unknown. Determine the maximum likelihood estimations for a and b.

This is example task from exam and I like to know how solve it correctly.

When I understand formula of maximum likelihood estimation correct,

This is maximize when $(b-a)$ is as small as possible but also important that (a,b) include all the data. For this reason we have

$a=min({x}_{1},..,{x}_{n})$

and

$b=max({x}_{1},..,{x}_{n})$

Population Data
Answered

Uroskopieulm
2022-11-20

I have been given the population of the USA from 1790 - 1980 (increasing in intervals of 10) and I am asked to solve this differential equation.

Using t as time in years, and P as size of population at any time t.

It shows $dP/dt=(b-d)P$.

I assume b and d are birth and death rates per 1000.

I have subbed in $B-d=13-8$.

I'm kinda puzzled I don't know what to do. I have made a table and graph on Excel with the data but I'm clueless. Any ideas guys and gals?

Population Data
Answered

Jenny Roberson
2022-11-20

The standard deviation is given by $\sqrt{\frac{\sum ({x}_{i}-x{)}^{2}}{n}}$, however when we estimate the standart deviation from a sample, the best estimation is $\sqrt{\frac{\sum ({x}_{i}-x{)}^{2}}{n-1}}$

How do I have to adjust the standarddeviation if I want to wheight my samples?

I.e. the standard deviation would be $\sqrt{\frac{\sum w({x}_{i})({x}_{i}-x{)}^{2}}{\sum w({x}_{i})}}$, if I had the entire data set. What is the correct estimation of the standard deviation, if I'm only given a subsample of the population?

Population Data
Answered

Annie French
2022-11-20

This question develops hypothesis tests for the difference between two population proportions.Let X ∼ Binomial(n, p1) and Y ∼ Binomial(m, p2) and suppose X and Y are independent. The hypotheses to be tested are: ${H}_{0}:{p}_{1}={p}_{2},{H}_{A}:{p}_{1}<{p}_{2}\text{}or\text{}{p}_{1}{p}_{2}$

(a) Find the generalized likelihood ratio statistic Λ for testing H0 vs. HA based on the data X and Y.

Population Data
Answered

Filloltarninsv9p
2022-11-19

It is often said that degree of freedom causes the need for standard deviation formula to be corrected. When explaining degree of freedom, it is often said that when one knows the mean of the formula, only $n-1$ data are actually needed, as the last data can be determined using mean and $n-1$ data. However, I see the same thing occuring in population - not just in sample. So what's going on here, and how is this justification really working?

For example, in simple linear regression model, variance of error terms are often sum of variance of each data divided by $n-2$. This is justified as said above. But if this justification is also true for population, not just sample, how is this really working?

Population Data
Answered

django0a6
2022-11-19

Suppose we observe a random sample of five measurements: 10, 13, 15, 15, 17, from a normal distribution with unknown mean ${\mu}_{1}$ and unknown variance ${\sigma}_{1}^{2}$, A second random sample from another normal population with unknown mean ${\mu}_{2}$ and unknown variance ${\sigma}_{2}^{2}$ yields the measurements: 13, 7, 9, 11.

a) Test for evidence that ${\sigma}_{1}>1.0$. Complete the P-value for this test as accurately as possible. Draw a conclusion at $\alpha =0.05$.

Here's what I've done so far:

Step 1: Calculate ${\sigma}_{1}$

${\sigma}_{1}=\sqrt{\frac{(10-14{)}^{2}+(13-14{)}^{2}+(15-14{)}^{2}+(15-14{)}^{2}+(17-14{)}^{2}}{5}}=\sqrt{\frac{28}{5}}=2.366$

Step 2: Set up Hypothesis Test

${H}_{0}:{\sigma}_{1}=2.366$

${H}_{a}:{\sigma}_{1}>1.0$

How do I proceed from here? Thanks.

EDIT:

Also have this question, and would appreciate some insight.

b) Use the pivotal method(and a pivotal statistic with F distribution) to derive a 95% confidence interval for $\frac{{\sigma}_{2}}{{\sigma}_{1}}$. Work it out for these data. And test the null hypothesis that ${\sigma}_{2}={\sigma}_{1}$ at the 5% level of significance.

Population Data
Answered

Clara Dennis
2022-11-19

Population Data
Answered

Aleah Avery
2022-11-18

The confidence interval can be expressed in terms of samples (or repeated samples): "Were this procedure to be repeated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true population parameter 90% of the time."[1] Note that this does not refer to repeated measurement of the same sample, but repeated sampling.

And:

The confidence interval can be expressed in terms of a single sample: "There is a 90% probability that the calculated confidence interval from some future experiment encompasses the true value of the population parameter." Note this is a probability statement about the confidence interval, not the population parameter.

And:

A 95% confidence interval does not mean that for a given realised interval calculated from sample data there is a 95% probability the population parameter lies within the interval, nor that there is a 95% probability that the interval covers the population parameter.[11] Once an experiment is done and an interval calculated, this interval either covers the parameter value or it does not; it is no longer a matter of probability.

Population Data
Answered

Filloltarninsv9p
2022-11-18

I'm a little confused at this question posed by my prof. He asked us to generate a binomial distribution in R and input whatever variables we wanted.

$x=rbinom(50,10,0.83)$

Then he asks us to compute the sample mean, sample variance, population mean and population variance of the distribution.

sample mean: mean(x)

sample var: var(x)

But I have no idea what he intends we do to get the population mean and variance. Don't you need a larger set of data to be the population and a smaller set to be the sample? I only (seem to) have one set here.

Population Data
Answered

Sophie Marks
2022-11-18

${x}^{\prime}(t)=-0.03x(t)+A$

where t is time in years. while $x(0)=16m$ and $x(-10)=15m$ find A.

This is what I did:

from the equation I know that each year the population of the country decrease in 0.03

I tried to use the data, since t is in years

$x({t}_{0})\ast 1.03=x({t}_{0}+1)$

Since we wish in 10 years the total popluation increase in million people therefore :

$x({t}_{0})\ast 1.03=x({t}_{0}+1)+100,000$

I get the feeling my solution is wrong, But I don't really find other ways to approach this.

Any ideas? I'll be glad if someone could tell me why my solution fails.

Population Data
Answered

klesstilne1
2022-11-18

I have a set of data points. When I draw a histogram of them, plotting their frequency of occurrence against them, I get a curve that looks like a normal curve. I am also able to perform test on the data set to know whether it follows a normal distribution or more precisely whether the population it comes from follows a normal probability distribution. I am using Shapiro Wilk test for it.

However, how can I know what the equation of that normal curve will be? Moreover, is there a way I can test whether other standard distributions fit the points more accurately, and estimate their parameters?

Population Data
Answered

Aron Heath
2022-11-17

This may seem like a weird question, but hear me out. I'm essentially struggling to see the connection between a t-value from a t-table and a t-value that is calculated.

The following formula is used to calculate the value of a t-score:

$t=\frac{\overline{X}-\mu}{\frac{S}{\sqrt{n}}}$

It requires a sample mean, a hypothesized population mean, and the standard deviation of the distribution of sample means (standard error).

According to the Central Limit Theorem, the distribution of sample means of a population is approximately normal and the sample distribution mean is equivalent to the population mean.

So the t-score formula is essentially calculating the magnitude of difference between the sample mean in question and the hypothesized population mean, relative to the variation in the sample data. Or in other words, how many standard errors the difference between sample mean and population mean comprise of. For example: If t was calculated to be 2, then the sample mean in question would be 2 standard errors away from the mean of the sample distribution.

1.) Phew, ok. So question 1: Let's just say a t-score of 1 was calculated for a sample mean and since a distribution of sample means is normal according to the CLT, does that mean that the sample mean in question is part of the 68% (because of the $68-95$ rule)of all sample means that are within 1 standard error of the sample mean distribution?

2.) Let's say we have a distribution of sample means of sample size 15. Is this distribution equivalent for a t-distribution of degrees of freedom 14? Or more importantly: Is the t-value from a t-table for 14 degrees of freedom and 95 confidence EQUIVALENT to a calculated t-value using a sample mean that is 2 standard errors away from the mean of a distribution of sample means with sample size 15?

Population Data
Answered

Ty Moore
2022-11-16

I am looking for P(S|R) given that a person tested positive where S denotes a sick person R denotes a person with a rash, given that they tested positive. If + denotes a person who tested positive and I use Baye's formula and the data to calculate $P(S|+)$ would $P(S|R)=\frac{P(R|+)P(S|+)}{P(R|+)}=P(S|+)$? Or would the answer be $P(S|R)=P(R)P(S|+)$? Or are both of these answers wrong? Also, I cannot tell if in the problem statement $P(R)=.25$ or $P(R|+)=.25$

Population Data
Answered

Aryanna Fisher
2022-11-15

The measurements yield the following data: 3.4, 3.6, 3.8, 3.3, 3.4, 3.5, 3.7, 3.6, 3.7, 3.4 and 3.6.

$n=11$

$\overline{X}=3.545$

$S=0.157$

Find the required sample size, which lung capacity should be measured, so coach can state his statement with 99% confidence. (assume ${\sigma}^{2}=0.09$)

I don't even know how should I start. My initial thought was to use the U statistics $U=\frac{\overline{X}-\mu}{\sigma}\sqrt{n}$~$N(0,1)$. But I don't know the U.

Population Data
Answered

Frankie Burnett
2022-11-15

In a population, it is estimated HIV prevalence to be $\lambda $. For a new test for HIV:

- $\theta $ is the probability of an HIV positive person to test positive

- $\eta $ is the probability an HIV negative person tests positive in this test.

A person takes the test to check whether they have HIV, he tests positive.

What is the predictive probability he tests negative on the second test?

Assumption: Repeat tests on the same person are conditionally independent.

From my notes predictive probability is given as:

$P(\stackrel{~}{Y}=\stackrel{~}{y}|Y=y)=\int p(\stackrel{~}{y}|\tau )p(\theta |\tau )$ here $\stackrel{~}{Y}$ is the unknown observable, y is the observed data and η the unknown.

I am interested in the probability of the second test is negative, given that the first test is positive,without knowing if the man really has HIV or not.

To facilitate this I define:

${y}_{1}$ as the event of the first test being positive and

$\stackrel{~}{{y}_{2}}$ as the second test being negative

Would this adaption to the formula given above be the correct/best approach to this problem ?

$p(\stackrel{~}{{y}_{2}},{y}_{1}|\tau )=p(\stackrel{~}{{y}_{2}}|\tau )p({y}_{1}|\tau )p(\tau )$ and this is really $\propto p(\stackrel{~}{{y}_{2}}|\tau )p(\tau |{y}_{1})$

I've gotten for the $p(\tau |{y}_{1})$ from Bayes' theorem:

$p(\tau |{y}_{1})=\frac{p(\tau )p({y}_{1}|\tau )}{p({y}_{1})}\phantom{\rule{0ex}{0ex}}=\frac{\lambda \theta}{\lambda \theta +\eta (1-\lambda )}$

How could I then find $p(\stackrel{~}{{y}_{2}}|\tau )$? Is this the correct approach?

Population Data
Answered

bruinhemd3ji
2022-11-14

I have a map, with many cities. I know light power is inversely proportional to the square of distance. I don't know about sky, air diffraction, cloud reflections.

Start from the simplest model. A flat terrain map. N light sources, every one at position X(n), Y(n) has a specific $\text{"total light power"}=C(n)\times X(n)$

At a specific point of coordinates (x,y) which is the light power, sum of all the cities light ?

I tried to calculate and plot, but it seems weird ( too far from some real satellite night shot ) and too slow to calculate.

Population Data
Answered

vidamuhae
2022-11-14

All of us are coping with the current COVID19 crisis. I hope that all of you stay safe and that this situation will end as soon as possible.

For this sad situation and for my unstoppable curiosity, I've started to read something about the SIR model. The variables of such model are s (the fraction of people susceptible to infection), y (the fraction of infected people) and r (the fraction of recovered people + the sad statistics of deaths). The model reads as:

$\{\begin{array}{l}\dot{s}=-\beta sy\\ \dot{y}=\beta sy-\gamma y\\ \dot{r}=\gamma y\end{array},$

where $\beta $ and $\gamma $ are positive parameters. One strong hypothesis of this model is that the population size is constant over time (deaths are assumed to be recovered, births are neglected since, hopefully, they will be the part of the population which for sure will be protected from the disease). The initial conditions are set such that $s(0)+y(0)+r(0)=1$ and $s(0)\ge 0$, $y(0)\ge 0$ and $r(0)\ge 0$. Under this assumption, it can be proven that $s(t)+y(t)+r(t)=1\text{}\mathrm{\forall}t0$.

The news often talk about the coefficient:

${R}_{0}=\frac{\beta}{\gamma},$

which rules the behavior of the system (for ${R}_{0}<1$ the disease will be wiped out, for ${R}_{0}>1$ it will spread out).

The same news also talk about the estimation of such parameter. Well, given the time series of s, y and r, it is rather easy to estimate the parameters $\beta $ and $\gamma $, and hence ${R}_{0}$. My main concern is about the time series. For each country we know the daily count of infected people (let's say Y(t)), of recovered (or dead) people (let's say R(t)).

Anyway, there are several infected people which are not recorded (let's say Y′(t)), and many of them get recovered without knowing that they have been infected (let's say R′(t))! Moreover, day after day, the number of tests on people is increasing.

If we indicate with N the (constant) size of population, we get that:

$y(t)=\frac{Y(t)+{Y}^{\prime}(t)}{N},r(t)=\frac{R(t)+{R}^{\prime}(t)}{N}\text{}\text{and}\text{}s(t)=1-y(t)-r(t).$

Here is the question(s). How can we perform the estimation of $\beta $ and $\gamma $ if we don't know the unobserved variables Y′(t) and R′(t)? How do the experts of the field estimate $\beta $ and $\gamma $ even though the available data are not complete? Do they use some data adjustment?