Binomial distribution sample vs. population mean. I'm a little confused at this question posed by my prof. He asked us to generate a binomial distribution in R and input whatever variables we wanted. x = rbinom(50, 10, 0.83)

Filloltarninsv9p 2022-11-18 Answered
Binomial distribution sample vs. population mean
I'm a little confused at this question posed by my prof. He asked us to generate a binomial distribution in R and input whatever variables we wanted.
x = r b i n o m ( 50 , 10 , 0.83 )
Then he asks us to compute the sample mean, sample variance, population mean and population variance of the distribution.
sample mean: mean(x)
sample var: var(x)
But I have no idea what he intends we do to get the population mean and variance. Don't you need a larger set of data to be the population and a smaller set to be the sample? I only (seem to) have one set here.
You can still ask an expert for help

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

Solve your problem for the price of one coffee

  • Available 24/7
  • Math expert for every subject
  • Pay only if we can solve it
Ask Question

Answers (1)

mainzollbtt
Answered 2022-11-19 Author has 13 answers
Explanation:
In your example, the population mean is 10 0.83 = 8.3 and the population variance is 10 0.83 ( 1 0.83 ) = 1.411
Did you like this example?
Subscribe for all access

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

You might be interested in

asked 2022-11-19
Degree of freedom and corrected standard deviation
It is often said that degree of freedom causes the need for standard deviation formula to be corrected. When explaining degree of freedom, it is often said that when one knows the mean of the formula, only n 1 data are actually needed, as the last data can be determined using mean and n 1 data. However, I see the same thing occuring in population - not just in sample. So what's going on here, and how is this justification really working?
For example, in simple linear regression model, variance of error terms are often sum of variance of each data divided by n 2. This is justified as said above. But if this justification is also true for population, not just sample, how is this really working?
asked 2022-10-31
Separating populations and estimating line-fit parameters
Given a dataset containing two populations, each of which can be described by a linear relationship between two variables in each sample with high R 2 , how does one separate the two populations (and incidentally compute the line-fit)?
This is fairly easy to do graphically - just create a scatterplot and the two lines are pretty apparent. But how does one do this algorithmically?
More generally, given a dataset containing an unknown number n of populations, each of which can be fit to a line with some lower bound on R 2 (e.g., .95), how does one separate the data into the minimum number of populations satisfying the R 2 criterion?
asked 2022-11-04
Finding the ( X X ¯ ) 2 of first 5 data of dataset given mean and population variance
Mean and population variance of the dataset x 1 , x 2 . . x 10 are 19 and 49 respectively. If the value i = 6 10 x i 2 = 1900, what is the value of i = 1 5 x i 2 = ?.
I've solved it as following and it is wrong:
Population variance:
S 2 = ( x i x ¯ ) 2 n 49 = i = 1 5 x i 2 10 + 1900 10 49 = i = 1 5 x i 2 10 + 190 190 + 49 = i = 1 5 x i 2 10 i = 1 5 x i 2 = 141 10 = 1410
And this solution is wrong. How to solve this problem?
asked 2022-10-23
The exercise statement (roughly): Assume there is a terrorist prevention system that has a 99% chance of correctly identifying a future terrorist and 99.9% chance of correctly identifying someone that is not a future terrorist. If there are 1000 future terrorists among the 300 million people population, and one individual is chosen randomly from the population, then processed by the system and deemed a terrorist. What is the chance that the individual is a future terrorist?
Attempted exercise solution:
I use the following event labels:
A -> The person is a future terrorist
B -> The person is identified as a terrorist
Then, some other data:
P ( A ) = 10 3 3 10 8 = 1 3 10 5
P ( A ¯ ) = 1 P ( A )
P ( B A ) = 0.99
P ( B ¯ A ) = 1 P ( B A )
P ( B ¯ A ¯ ) = 0.999
P ( B A ¯ ) = 1 P ( B ¯ A ¯ )
What I need to find is the chance that someone identified as a terrorist, is actually a terrorist. I express that through P(A | B) and use Bayes Theorem to find its value.
P ( A B ) = P ( A B ) P ( B ) = P ( B A ) P ( A ) P ( B A ) P ( A ) + P ( B A ¯ ) P ( A ¯ )
The answer I get after plugging-in all the values is: 3.29 10 3 , the book's answer is 3.29 10 4 .
Can someone help me identify what I'm doing wrong? Also, in either case, I find that it is very unintuitive that the probability of success is so small. If someone could explain it to me in more intuitive terms I'd be very grateful.
asked 2022-11-14
COVID19 data statistical adjustment for SIR model and estimation
All of us are coping with the current COVID19 crisis. I hope that all of you stay safe and that this situation will end as soon as possible.
For this sad situation and for my unstoppable curiosity, I've started to read something about the SIR model. The variables of such model are s (the fraction of people susceptible to infection), y (the fraction of infected people) and r (the fraction of recovered people + the sad statistics of deaths). The model reads as:
{ s ˙ = β s y y ˙ = β s y γ y r ˙ = γ y ,
where β and γ are positive parameters. One strong hypothesis of this model is that the population size is constant over time (deaths are assumed to be recovered, births are neglected since, hopefully, they will be the part of the population which for sure will be protected from the disease). The initial conditions are set such that s ( 0 ) + y ( 0 ) + r ( 0 ) = 1 and s ( 0 ) 0, y ( 0 ) 0 and r ( 0 ) 0. Under this assumption, it can be proven that s ( t ) + y ( t ) + r ( t ) = 1   t > 0.
The news often talk about the coefficient:
R 0 = β γ ,
which rules the behavior of the system (for R 0 < 1 the disease will be wiped out, for R 0 > 1 it will spread out).
The same news also talk about the estimation of such parameter. Well, given the time series of s, y and r, it is rather easy to estimate the parameters β and γ, and hence R 0 . My main concern is about the time series. For each country we know the daily count of infected people (let's say Y(t)), of recovered (or dead) people (let's say R(t)).
Anyway, there are several infected people which are not recorded (let's say Y′(t)), and many of them get recovered without knowing that they have been infected (let's say R′(t))! Moreover, day after day, the number of tests on people is increasing.
If we indicate with N the (constant) size of population, we get that:
y ( t ) = Y ( t ) + Y ( t ) N , r ( t ) = R ( t ) + R ( t ) N   and   s ( t ) = 1 y ( t ) r ( t ) .
Here is the question(s). How can we perform the estimation of β and γ if we don't know the unobserved variables Y′(t) and R′(t)? How do the experts of the field estimate β and γ even though the available data are not complete? Do they use some data adjustment?
asked 2022-10-29
Question regarding the definition of confident interval
Someone calculates the 95% confidence interval for the average weight of teenagers by collecting data from a simple random sample of 100 teenagers. Circle the correct interpretation(s) of the confidence interval (there may be more than one correct answer):
1) There is a 95% chance that the average weight of all teenagers falls in this range.
2) There is a 95% chance that the interval created includes the average weight of all teenagers.
Solution given says that
1) Almost correct, but no: the average weight of all teenagers is not random
2) Correct
Now, I've been reading and rereading the two questions and I can't differentiate how the two questions are different.
To me, the second question basically restates the same scenario as the first question.
What I know about confidence interval is that it's for the population. The interval gives a range and how confident the range covers the TRUE value.
So in the first case, it sounds exactly like what the definition suggests?
Is the solution given correct or did I missed some nuances between the two questions?
asked 2022-11-05
How to refine population statistic when more data is available
Suppose I have two pieces of data about two populations. The first piece of data is the national accident rate, denoted A The second piece of data is the national safety rate, a related, but not exactly inverse piece of data, denoted S
Now if I were to be given an additional piece of data, say a particular cities safety rating, and asked, what is the best guess of that cities accident rating, how would I approach this problem?
Also, I am not sure what this kind of situation/problem is called, if someone could point out the branch of statistics this falls under, that would also be helpful.

New questions