Fast-Track Population Data Learning: Tips and More!

Baardegem3Gw 2022-11-24

How do I know if a Binomial model is appropriate?
I have a question which is about the number of weeks out of 5 in which an event occurs. I have a frequency table with a sample of 40 - with $x = 0, 1, 2, 3, 4, 5$ and freq, 2,7,11,12,6,2.
I have worked out the unbiased population mean and estimate - but then I'm not sure whether binomial what I need or not? I have to decide if a bionomial model is appropriate.
I can see that the data is discrete but its not binary like "event happens" or "event does not happen". It seems relatively symmetrical - and almost normally distributed? I'm not really sure how to work this out? Is a binomial model right or not?

nazismes2w7 2022-11-23

A quick question regarding hypothesis testing (decision variable)
If we have two normally distributed populations $P_{1}$ and $P_{2}$ , then to test the hypothesis $H_{0} : μ_{1} = μ_{2}$ against an alternative hypothesis, we choose the decision variable according to three cases:
Case 1: If $σ_{1}$ and $σ_{2}$ are known, then regardless whether $n_{1}$ and $n_{2}$ are large or small, we consider the standard normal statistic:
$Z = \frac{{\bar{X}}_{1} - {\bar{X}}_{2}}{\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}}$
Case 2: If the $σ$ 's are unknown, but $n_{1}$ and $n_{2}$ are large, then we consider:
$Z = \frac{{\bar{X}}_{1} - {\bar{X}}_{2}}{\sqrt{\frac{S_{X_{1}}^{2}}{n_{1}} + \frac{S_{X_{2}}^{2}}{n_{2}}}}$
Case 3: If the $σ$ 's are unknown and $n_{1}, n_{2}$ are small, then we consider the student-t statistic:
$T = \frac{{\bar{X}}_{1} - {\bar{X}}_{2}}{\hat{σ} \sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}}$
Where ${\hat{σ}}^{2} := \frac{(n_{1} - 1) S_{X_{1}}^{2} + (n_{2} - 1) S_{X_{2}}^{2}}{n_{1} + n_{2} - 2}$
Is this correct?

NormmodulxEE 2022-11-22

Is the posterior always a compromise between the prior and the data?
Suppose that we are interested in learning the proportion of the population $θ$ with a particular property (for instance, the fraction of the population who are male). Suppose that we randomly sample n members of this population (with replacement, to make things easier) and observe that y of them have the property (so the fraction of the sample with the property is y/n). We start with a continuous prior p( $θ$ ) with full support [0,1] and update this using Bayes rule.
Question: does the expected value of the posterior always lie between the prior expectation and the sample fraction y/n?

SevcamXnr 2022-11-22

Given $X_{1}, . . X_{n}$ observations of a population with densitiy
$f (x) = {\begin{matrix} \frac{1}{b - a} if x \in (a, b) \\ 0 else \end{matrix}$
where interval limits a and b are unknown. Determine the maximum likelihood estimations for a and b.
This is example task from exam and I like to know how solve it correctly.
When I understand formula of maximum likelihood estimation correct,
This is maximize when $(b - a)$ is as small as possible but also important that (a,b) include all the data. For this reason we have
$a = min (x_{1}, . ., x_{n})$
and
$b = max (x_{1}, . ., x_{n})$

Uroskopieulm 2022-11-20

Finding a population Function
I have been given the population of the USA from 1790 - 1980 (increasing in intervals of 10) and I am asked to solve this differential equation.
Using t as time in years, and P as size of population at any time t.
It shows $d P / d t = (b - d) P$ .
I assume b and d are birth and death rates per 1000.
I have subbed in $B - d = 13 - 8$ .
I'm kinda puzzled I don't know what to do. I have made a table and graph on Excel with the data but I'm clueless. Any ideas guys and gals?

Jenny Roberson 2022-11-20

Estimating standard deviation from wheighted sample
The standard deviation is given by $\sqrt{\frac{\sum (x_{i} - x)^{2}}{n}}$ , however when we estimate the standart deviation from a sample, the best estimation is $\sqrt{\frac{\sum (x_{i} - x)^{2}}{n - 1}}$
How do I have to adjust the standarddeviation if I want to wheight my samples?
I.e. the standard deviation would be $\sqrt{\frac{\sum w (x_{i}) (x_{i} - x)^{2}}{\sum w (x_{i})}}$ , if I had the entire data set. What is the correct estimation of the standard deviation, if I'm only given a subsample of the population?

Annie French 2022-11-20

Generalized likelihood ratio statistic for two binomial distributions
This question develops hypothesis tests for the difference between two population proportions.Let X ∼ Binomial(n, p1) and Y ∼ Binomial(m, p2) and suppose X and Y are independent. The hypotheses to be tested are: $H_{0} : p_{1} = p_{2}, H_{A} : p_{1} < p_{2} o r p_{1} > p_{2}$
(a) Find the generalized likelihood ratio statistic Λ for testing H0 vs. HA based on the data X and Y.

Filloltarninsv9p 2022-11-19

Degree of freedom and corrected standard deviation
It is often said that degree of freedom causes the need for standard deviation formula to be corrected. When explaining degree of freedom, it is often said that when one knows the mean of the formula, only $n - 1$ data are actually needed, as the last data can be determined using mean and $n - 1$ data. However, I see the same thing occuring in population - not just in sample. So what's going on here, and how is this justification really working?
For example, in simple linear regression model, variance of error terms are often sum of variance of each data divided by $n - 2$ . This is justified as said above. But if this justification is also true for population, not just sample, how is this really working?

django0a6 2022-11-19

Getting P-value While Using Variance
Suppose we observe a random sample of five measurements: 10, 13, 15, 15, 17, from a normal distribution with unknown mean $μ_{1}$ and unknown variance $σ_{1}^{2}$ , A second random sample from another normal population with unknown mean $μ_{2}$ and unknown variance $σ_{2}^{2}$ yields the measurements: 13, 7, 9, 11.
a) Test for evidence that $σ_{1} > 1.0$ . Complete the P-value for this test as accurately as possible. Draw a conclusion at $α = 0.05$ .
Here's what I've done so far:
Step 1: Calculate $σ_{1}$
$σ_{1} = \sqrt{\frac{(10 - 14)^{2} + (13 - 14)^{2} + (15 - 14)^{2} + (15 - 14)^{2} + (17 - 14)^{2}}{5}} = \sqrt{\frac{28}{5}} = 2.366$
Step 2: Set up Hypothesis Test
$H_{0} : σ_{1} = 2.366$
$H_{a} : σ_{1} > 1.0$
How do I proceed from here? Thanks.
EDIT:
Also have this question, and would appreciate some insight.
b) Use the pivotal method(and a pivotal statistic with F distribution) to derive a 95% confidence interval for $\frac{σ_{2}}{σ_{1}}$ . Work it out for these data. And test the null hypothesis that $σ_{2} = σ_{1}$ at the 5% level of significance.

Clara Dennis 2022-11-19

We have data come from a normally distributed population with standard deviation 2.6, what sample size is needed to make sure that with 99% probability, the mean of the sample will be in error by at most 0.25?

Aleah Avery 2022-11-18

According to frequentists, why can't probabilistic statements be made about population paramemters?
The confidence interval can be expressed in terms of samples (or repeated samples): "Were this procedure to be repeated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true population parameter 90% of the time."[1] Note that this does not refer to repeated measurement of the same sample, but repeated sampling.
And:
The confidence interval can be expressed in terms of a single sample: "There is a 90% probability that the calculated confidence interval from some future experiment encompasses the true value of the population parameter." Note this is a probability statement about the confidence interval, not the population parameter.
And:
A 95% confidence interval does not mean that for a given realised interval calculated from sample data there is a 95% probability the population parameter lies within the interval, nor that there is a 95% probability that the interval covers the population parameter.[11] Once an experiment is done and an interval calculated, this interval either covers the parameter value or it does not; it is no longer a matter of probability.

Filloltarninsv9p 2022-11-18

Binomial distribution sample vs. population mean
I'm a little confused at this question posed by my prof. He asked us to generate a binomial distribution in R and input whatever variables we wanted.
$x = r b i n o m (50, 10, 0.83)$
Then he asks us to compute the sample mean, sample variance, population mean and population variance of the distribution.
sample mean: mean(x)
sample var: var(x)
But I have no idea what he intends we do to get the population mean and variance. Don't you need a larger set of data to be the population and a smaller set to be the sample? I only (seem to) have one set here.

Sophie Marks 2022-11-18

In order to have population increasing, a country accepts A people a year from other counntries, and the number x(t) of people in a country changes by equation
$x^{'} (t) = - 0.03 x (t) + A$
where t is time in years. while $x (0) = 16 m$ and $x (- 10) = 15 m$ find A.
This is what I did:
from the equation I know that each year the population of the country decrease in 0.03
I tried to use the data, since t is in years
$x (t_{0}) * 1.03 = x (t_{0} + 1)$
Since we wish in 10 years the total popluation increase in million people therefore :
$x (t_{0}) * 1.03 = x (t_{0} + 1) + 100, 000$
I get the feeling my solution is wrong, But I don't really find other ways to approach this.
Any ideas? I'll be glad if someone could tell me why my solution fails.

klesstilne1 2022-11-18

Estimating Gaussian parameters of a set of data points
I have a set of data points. When I draw a histogram of them, plotting their frequency of occurrence against them, I get a curve that looks like a normal curve. I am also able to perform test on the data set to know whether it follows a normal distribution or more precisely whether the population it comes from follows a normal probability distribution. I am using Shapiro Wilk test for it.
However, how can I know what the equation of that normal curve will be? Moreover, is there a way I can test whether other standard distributions fit the points more accurately, and estimate their parameters?

Aron Heath 2022-11-17

Is a t distribution for a certain degree of freedom equivalent to the sample mean distribution for the corresponding sample size?
This may seem like a weird question, but hear me out. I'm essentially struggling to see the connection between a t-value from a t-table and a t-value that is calculated.
The following formula is used to calculate the value of a t-score:
$t = \frac{\bar{X} - μ}{\frac{S}{\sqrt{n}}}$
It requires a sample mean, a hypothesized population mean, and the standard deviation of the distribution of sample means (standard error).
According to the Central Limit Theorem, the distribution of sample means of a population is approximately normal and the sample distribution mean is equivalent to the population mean.
So the t-score formula is essentially calculating the magnitude of difference between the sample mean in question and the hypothesized population mean, relative to the variation in the sample data. Or in other words, how many standard errors the difference between sample mean and population mean comprise of. For example: If t was calculated to be 2, then the sample mean in question would be 2 standard errors away from the mean of the sample distribution.
1.) Phew, ok. So question 1: Let's just say a t-score of 1 was calculated for a sample mean and since a distribution of sample means is normal according to the CLT, does that mean that the sample mean in question is part of the 68% (because of the $68 - 95$ rule)of all sample means that are within 1 standard error of the sample mean distribution?
2.) Let's say we have a distribution of sample means of sample size 15. Is this distribution equivalent for a t-distribution of degrees of freedom 14? Or more importantly: Is the t-value from a t-table for 14 degrees of freedom and 95 confidence EQUIVALENT to a calculated t-value using a sample mean that is 2 standard errors away from the mean of a distribution of sample means with sample size 15?

Ty Moore 2022-11-16

In a certain village, 20% of the population has some disease. A test is administered which has the property that if a person is sick, the test will be positive 90% of the time and if the person is not sick, then the test will still be positive 30% of the time. All people tested positive are prescribed a drug which always cures the disease but produces a rash 25% of the time. Given that a random person has the rash, what is the probability that this person had the disease to start with?
I am looking for P(S|R) given that a person tested positive where S denotes a sick person R denotes a person with a rash, given that they tested positive. If + denotes a person who tested positive and I use Baye's formula and the data to calculate $P (S | +)$ would $P (S | R) = \frac{P (R | +) P (S | +)}{P (R | +)} = P (S | +)$ ? Or would the answer be $P (S | R) = P (R) P (S | +)$ ? Or are both of these answers wrong? Also, I cannot tell if in the problem statement $P (R) = .25$ or $P (R | +) = .25$

Aryanna Fisher 2022-11-15

A coach has made a statement that his players have bigger lung capacity than the average of the population of the same age which is 3.4. (Normal distribution)
The measurements yield the following data: 3.4, 3.6, 3.8, 3.3, 3.4, 3.5, 3.7, 3.6, 3.7, 3.4 and 3.6.
$n = 11$
$\bar{X} = 3.545$
$S = 0.157$
Find the required sample size, which lung capacity should be measured, so coach can state his statement with 99% confidence. (assume $σ^{2} = 0.09$ )
I don't even know how should I start. My initial thought was to use the U statistics $U = \frac{\bar{X} - μ}{σ} \sqrt{n}$ ~ $N (0, 1)$ . But I don't know the U.

Frankie Burnett 2022-11-15

A man tests for HIV. What is the predictive probability that his second test is negative?
In a population, it is estimated HIV prevalence to be $λ$ . For a new test for HIV:
- $θ$ is the probability of an HIV positive person to test positive
- $η$ is the probability an HIV negative person tests positive in this test.
A person takes the test to check whether they have HIV, he tests positive.
What is the predictive probability he tests negative on the second test?
Assumption: Repeat tests on the same person are conditionally independent.
From my notes predictive probability is given as:
$P (\tilde{Y} = \tilde{y} | Y = y) = \int p (\tilde{y} | τ) p (θ | τ)$ here $\tilde{Y}$ is the unknown observable, y is the observed data and η the unknown.
I am interested in the probability of the second test is negative, given that the first test is positive,without knowing if the man really has HIV or not.
To facilitate this I define:
$y_{1}$ as the event of the first test being positive and
$\tilde{y_{2}}$ as the second test being negative
Would this adaption to the formula given above be the correct/best approach to this problem ?
$p (\tilde{y_{2}}, y_{1} | τ) = p (\tilde{y_{2}} | τ) p (y_{1} | τ) p (τ)$ and this is really $\propto p (\tilde{y_{2}} | τ) p (τ | y_{1})$
I've gotten for the $p (τ | y_{1})$ from Bayes' theorem:
$p (τ | y_{1}) = \frac{p (τ) p (y_{1} | τ)}{p (y_{1})} = \frac{λ θ}{λ θ + η (1 - λ)}$
How could I then find $p (\tilde{y_{2}} | τ)$ ? Is this the correct approach?

bruinhemd3ji 2022-11-14

I am fond of astronomy and environment. I want to try to make a "light pollution map" but I haven't my satellites... so I use as approximation of light pollution the cities' population. Let say we have for each city C citizens, each one spreads an average of X Watt of electricity for lightning ( I have these data ). Skip the units ( I need just a rough dimensionless "light power" ): city $city light power = C \times X$
I have a map, with many cities. I know light power is inversely proportional to the square of distance. I don't know about sky, air diffraction, cloud reflections.
Start from the simplest model. A flat terrain map. N light sources, every one at position X(n), Y(n) has a specific $"total light power" = C (n) \times X (n)$
At a specific point of coordinates (x,y) which is the light power, sum of all the cities light ?
I tried to calculate and plot, but it seems weird ( too far from some real satellite night shot ) and too slow to calculate.

vidamuhae 2022-11-14

COVID19 data statistical adjustment for SIR model and estimation
All of us are coping with the current COVID19 crisis. I hope that all of you stay safe and that this situation will end as soon as possible.
For this sad situation and for my unstoppable curiosity, I've started to read something about the SIR model. The variables of such model are s (the fraction of people susceptible to infection), y (the fraction of infected people) and r (the fraction of recovered people + the sad statistics of deaths). The model reads as:
${\begin{cases} \dot{s} = - β s y \\ \dot{y} = β s y - γ y \\ \dot{r} = γ y \end{cases},$
where $β$ and $γ$ are positive parameters. One strong hypothesis of this model is that the population size is constant over time (deaths are assumed to be recovered, births are neglected since, hopefully, they will be the part of the population which for sure will be protected from the disease). The initial conditions are set such that $s (0) + y (0) + r (0) = 1$ and $s (0) \geq 0$ , $y (0) \geq 0$ and $r (0) \geq 0$ . Under this assumption, it can be proven that $s (t) + y (t) + r (t) = 1 \forall t > 0$ .
The news often talk about the coefficient:
$R_{0} = \frac{β}{γ},$
which rules the behavior of the system (for $R_{0} < 1$ the disease will be wiped out, for $R_{0} > 1$ it will spread out).
The same news also talk about the estimation of such parameter. Well, given the time series of s, y and r, it is rather easy to estimate the parameters $β$ and $γ$ , and hence $R_{0}$ . My main concern is about the time series. For each country we know the daily count of infected people (let's say Y(t)), of recovered (or dead) people (let's say R(t)).
Anyway, there are several infected people which are not recorded (let's say Y′(t)), and many of them get recovered without knowing that they have been infected (let's say R′(t))! Moreover, day after day, the number of tests on people is increasing.
If we indicate with N the (constant) size of population, we get that:
$y (t) = \frac{Y (t) + Y^{'} (t)}{N}, r (t) = \frac{R (t) + R^{'} (t)}{N} and s (t) = 1 - y (t) - r (t) .$
Here is the question(s). How can we perform the estimation of $β$ and $γ$ if we don't know the unobserved variables Y′(t) and R′(t)? How do the experts of the field estimate $β$ and $γ$ even though the available data are not complete? Do they use some data adjustment?

Expert Assistance for Population Data: Comprehensive Resources and Practice Problems