Master Statistics and Probability Problems with Expert Help

Recent questions in Statistics and Probability
College StatisticsAnswered question
atgnybo4fq atgnybo4fq 2022-11-04

Determining sample size of a set of boolean data where the probability is not 50%
I'll lay out the problem as a simplified puzzle of what I am attempting to calculate. I imagine some of this may seem fairly straightforward to many but I'm starting to get a bit lost in my head while trying to think through the problem.
Let's say I roll a 1000-sided die until it lands on the number 1. Let's say it took me 700 rolls to get there. I want to prove that the first 699 rolls were not number 1 and obviously the only way to deterministically do this is to include the first 699 failures as part of the result to show they were in fact "not 1".
However, that's a lot of data I would need to prove this. I would have to include all 700 rolls, which is a lot. Therefore, I want to probabilistically demonstrate the fact that I rolled 699 "not 1s" prior to rolling a 1. To do this, I decide I will randomly sample my "not 1" rolls to reduce the set to a statistically significant, yet more wieldy number. It will be good enough to demonstrate that I very probably did not roll a 1 prior to roll 700.
Here are my current assumptions about the state of this problem:
- My initial experiment of rolling until success is one of geometric distribution.
- However my goal for this problem is to demonstrate to a third party that I am not lying, therefore the skeptical third party is not concerned with geometric distribution but would view this simply as a binomial distribution problem.
A lot of sample size calculators exist on the web. They are all based around binomial distribution from what I can tell. So here's the formula I am considering:
n = N × X X + N 1
X = Z α / 2 2 ­ × p × ( 1 p ) M O E 2
n is sample size
N is population size
Z is critical value ( α is 1 c o n f i d e n c e   l e v e l   a s   p r o b a b i l i t y )
p is sample proportion
MOE is margin of error
As an aside, the website where I got this formula says it implements "finite population correction", is this desirable for my requirements?
Here is the math executed on my above numbers. I will use Z a / 2 = 2.58 for α = 0.01, p = 0.001 and M O E = 0.005. As stated above, N = 699 on account of there being 699 failure cases that I would like to sample with a certain level of confidence.
Based on my understanding, what this math will do is recommend a sample size that will show, with 99% confidence, that the sample result is within 0.5 percentage points of reality.
Doing the math, X = 265.989744 and n = 192.8722086653 193, implying that I can have a sample size of 193 to fulfill this confidence level and interval.
My main question is whether my assumption about p = 1 1000 is valid. If it's not, and I use the conservative p = 0.5, then my sample size shoots up to 692. So I would like to know if my assumptions about what sample proportion actually is are correct.
More broadly, am I on the right track at all with this? From my attempt at demonstrating this probabilistically to my current thought process, is any of this accurate at all?

College StatisticsAnswered question
kituoti126 kituoti126 2022-11-02

Solve PDE using method of characteristics with non-local boundary conditions.
Given the population model by the following linear first order PDE in u(a,t) with constants b and μ:
u a + u t = μ t u a , t > 0
u ( a , 0 ) = u 0 ( a ) a 0
u ( 0 , t ) = F ( t ) = b 0 u ( a , t ) d a
We can split the integral in two with our non-local boundary data:
F ( t ) = b 0 t u ( a , t ) d a + b t u ( a , t ) d a
Choosing the characteristic coordinates ( ξ , τ ) and re-arranging the expression to form the normal to the solution surface we have the following equation with initial conditions:
( u a , u t , 1 ) ( 1 , 1 , μ t u ) = 0
x ( 0 ) = ξ , t ( 0 ) = 0 , u ( 0 ) = u 0 ( ξ )
Characteristic equations:
d a d τ = 1 , d t d τ = 1 , d u d τ = μ t u
Solving each of these ODE's in τ gives the following:
( 1 ) d a = d τ ( 2 ) d t = d τ ( 3 ) d u = μ t u d τ
a = τ + F ( ξ ) t = τ + F ( ξ )
a = τ + ξ t = τ
d u = μ τ u d τ
1 u d u = μ τ d τ
ln u = 1 2 μ τ 2 + F ( ξ )
u = G ( ξ ) e 1 2 μ τ 2
u = u 0 ( ξ ) e 1 2 μ τ 2
Substituting back the original coordinates we can re-write this expression with a coordinate change:
ξ = a t τ = t
u ( a , t ) = u 0 ( a t ) e 1 2 t 2
Now this is where I get stuck, how do I use the boundary data to come up with a well-posed solution?
u ( 0 , t ) = u 0 ( t ) e 1 2 μ t 2 = b 0 t u ( a , t ) d a + b t u ( a , t ) d a

College StatisticsAnswered question
Emmanuel Giles Emmanuel Giles 2022-11-02

Using a "population" consisting of probabilities to predict accuracy of sample
Since I'm not sure if the title explains my question well enough I've come up with an example myself:
Let's say I live in a country where every citizen goes to work everyday and every citizen has the choice to go by bus or by train (every citizen makes this choice everyday again - there are almost no citizens who always go by train and never by bus, and vice-versa).
I've done a lot of sampling and I have data on one million citizens about their behaviour in the past 1000 days. So, I calculate the "probability" per citizen of going by train on a single day. I can also calculate the average of those calculated probabilities of all citizens, let's say the average probability of a citizen going by train is 0.27. I figured that most citizens have tendencies around this number (most citizens have an individual probability between 0.22 and 0.32 of going by train for example).
Now, I started sampling an unknown person (but known to be living in the same country) and after asking him 10 consecutive days whether he went by train or by bus, I know that this person went to his work by train 4 times, and by bus 6 times.
My final question: how can I use my (accurate) data on one million citizens to approximate this person's probability of going by train?
I know that if I do the calculation the other way around, so, calculate the probability of this event occurring given the fact that I know this person's REAL probability is 0.4 this results in: 0.4 4 0.6 6 10 C 4 =∼ 25 %. I could calculate this probability for all possible probabilities between 0.00 and 1.00 (so, 0 % 100 % without any numbers in between) and sum them all, which sums to about 910%. I could set this to 100% (dividing by 9.1) and set all other percentages accordingly (dividing everything by 9.1 - so, our 25% becomes ~2.75%) and come up with a weighted sum: 2.75 % 0.4 + X % 0.41 etc., but this must be wrong since I'm not taking my accurate samples of the population into account.

The problem with the majority of statistics and probability examples that you will encounter elsewhere is that they contain no analysis. As we collect solutions to the most popular questions related to statistics and probability, we believe that students require statistics and probability with applications answers because it is what can be determined as help with statistical data for various disciplines. Regardless of what statistical task you might be approaching, examples of quantitative data must be explored first to determine the objectives. See statistics and probability problems that are presented. As you will see statistics and probability questions must