Expert Assistance for Population Data: Comprehensive Resources and Practice Problems

Recent questions in Population Data
College StatisticsAnswered question
Clara Dennis Clara Dennis 2022-11-12

I am not a mathematician, so go easy on me. I'm a programmer.
I have a database that I got from the Internet (USDA National Nutrient Database for Standard Reference), detailing the amount of each nutrient in each of a few thousand foodstuffs. I wanted to write a program that would be able to create a maximally nutritious meal based on this data.
For each nutrient, I have a target and two penalties - one for going over and one for going under the target (since, for example, it's a lot worse to get too much saturated fat than not enough). The goal is to minimize the sum of the penalties.
The meal can select from all the thousands of foodstuffs, but can only contain five or six.
I wrote the program in Java, implemented a genetic algorithm, specified my requirements, and let it run. It produced recommendations that were pure poison, and didn't seem to improve with time.
Maybe I just don't get genetic algorithms? Let's see what I did...
1) Create a population of randomly generated meals.
2) Normalize each one so it has 2000 calories, by multiplying the amount of each foodstuff proportionally.
3) Select the best 10% of meals to be parents.
4) Create a new generation - a few random to avoid local minima, the rest created by combining the numbers and amounts from the parents.
5) GOTO 2.
What other algorithm can I try? Someone advised me to use simplex algorithm, but I can't seem to explain to it (the implementation in Apache Commons Math) what my fitness function is. But he claimed it would be a natural fit, and I have even heard of someone who used simplex for exactly this.

College StatisticsAnswered question
atgnybo4fq atgnybo4fq 2022-11-04

Determining sample size of a set of boolean data where the probability is not 50%
I'll lay out the problem as a simplified puzzle of what I am attempting to calculate. I imagine some of this may seem fairly straightforward to many but I'm starting to get a bit lost in my head while trying to think through the problem.
Let's say I roll a 1000-sided die until it lands on the number 1. Let's say it took me 700 rolls to get there. I want to prove that the first 699 rolls were not number 1 and obviously the only way to deterministically do this is to include the first 699 failures as part of the result to show they were in fact "not 1".
However, that's a lot of data I would need to prove this. I would have to include all 700 rolls, which is a lot. Therefore, I want to probabilistically demonstrate the fact that I rolled 699 "not 1s" prior to rolling a 1. To do this, I decide I will randomly sample my "not 1" rolls to reduce the set to a statistically significant, yet more wieldy number. It will be good enough to demonstrate that I very probably did not roll a 1 prior to roll 700.
Here are my current assumptions about the state of this problem:
- My initial experiment of rolling until success is one of geometric distribution.
- However my goal for this problem is to demonstrate to a third party that I am not lying, therefore the skeptical third party is not concerned with geometric distribution but would view this simply as a binomial distribution problem.
A lot of sample size calculators exist on the web. They are all based around binomial distribution from what I can tell. So here's the formula I am considering:
n = N × X X + N 1
X = Z α / 2 2 ­ × p × ( 1 p ) M O E 2
n is sample size
N is population size
Z is critical value ( α is 1 c o n f i d e n c e   l e v e l   a s   p r o b a b i l i t y )
p is sample proportion
MOE is margin of error
As an aside, the website where I got this formula says it implements "finite population correction", is this desirable for my requirements?
Here is the math executed on my above numbers. I will use Z a / 2 = 2.58 for α = 0.01, p = 0.001 and M O E = 0.005. As stated above, N = 699 on account of there being 699 failure cases that I would like to sample with a certain level of confidence.
Based on my understanding, what this math will do is recommend a sample size that will show, with 99% confidence, that the sample result is within 0.5 percentage points of reality.
Doing the math, X = 265.989744 and n = 192.8722086653 193, implying that I can have a sample size of 193 to fulfill this confidence level and interval.
My main question is whether my assumption about p = 1 1000 is valid. If it's not, and I use the conservative p = 0.5, then my sample size shoots up to 692. So I would like to know if my assumptions about what sample proportion actually is are correct.
More broadly, am I on the right track at all with this? From my attempt at demonstrating this probabilistically to my current thought process, is any of this accurate at all?

College StatisticsAnswered question
kituoti126 kituoti126 2022-11-02

Solve PDE using method of characteristics with non-local boundary conditions.
Given the population model by the following linear first order PDE in u(a,t) with constants b and μ:
u a + u t = μ t u a , t > 0
u ( a , 0 ) = u 0 ( a ) a 0
u ( 0 , t ) = F ( t ) = b 0 u ( a , t ) d a
We can split the integral in two with our non-local boundary data:
F ( t ) = b 0 t u ( a , t ) d a + b t u ( a , t ) d a
Choosing the characteristic coordinates ( ξ , τ ) and re-arranging the expression to form the normal to the solution surface we have the following equation with initial conditions:
( u a , u t , 1 ) ( 1 , 1 , μ t u ) = 0
x ( 0 ) = ξ , t ( 0 ) = 0 , u ( 0 ) = u 0 ( ξ )
Characteristic equations:
d a d τ = 1 , d t d τ = 1 , d u d τ = μ t u
Solving each of these ODE's in τ gives the following:
( 1 ) d a = d τ ( 2 ) d t = d τ ( 3 ) d u = μ t u d τ
a = τ + F ( ξ ) t = τ + F ( ξ )
a = τ + ξ t = τ
d u = μ τ u d τ
1 u d u = μ τ d τ
ln u = 1 2 μ τ 2 + F ( ξ )
u = G ( ξ ) e 1 2 μ τ 2
u = u 0 ( ξ ) e 1 2 μ τ 2
Substituting back the original coordinates we can re-write this expression with a coordinate change:
ξ = a t τ = t
u ( a , t ) = u 0 ( a t ) e 1 2 t 2
Now this is where I get stuck, how do I use the boundary data to come up with a well-posed solution?
u ( 0 , t ) = u 0 ( t ) e 1 2 μ t 2 = b 0 t u ( a , t ) d a + b t u ( a , t ) d a

College StatisticsAnswered question
Emmanuel Giles Emmanuel Giles 2022-11-02

Using a "population" consisting of probabilities to predict accuracy of sample
Since I'm not sure if the title explains my question well enough I've come up with an example myself:
Let's say I live in a country where every citizen goes to work everyday and every citizen has the choice to go by bus or by train (every citizen makes this choice everyday again - there are almost no citizens who always go by train and never by bus, and vice-versa).
I've done a lot of sampling and I have data on one million citizens about their behaviour in the past 1000 days. So, I calculate the "probability" per citizen of going by train on a single day. I can also calculate the average of those calculated probabilities of all citizens, let's say the average probability of a citizen going by train is 0.27. I figured that most citizens have tendencies around this number (most citizens have an individual probability between 0.22 and 0.32 of going by train for example).
Now, I started sampling an unknown person (but known to be living in the same country) and after asking him 10 consecutive days whether he went by train or by bus, I know that this person went to his work by train 4 times, and by bus 6 times.
My final question: how can I use my (accurate) data on one million citizens to approximate this person's probability of going by train?
I know that if I do the calculation the other way around, so, calculate the probability of this event occurring given the fact that I know this person's REAL probability is 0.4 this results in: 0.4 4 0.6 6 10 C 4 =∼ 25 %. I could calculate this probability for all possible probabilities between 0.00 and 1.00 (so, 0 % 100 % without any numbers in between) and sum them all, which sums to about 910%. I could set this to 100% (dividing by 9.1) and set all other percentages accordingly (dividing everything by 9.1 - so, our 25% becomes ~2.75%) and come up with a weighted sum: 2.75 % 0.4 + X % 0.41 etc., but this must be wrong since I'm not taking my accurate samples of the population into account.

Studying population data is an important tool to understanding the world and its inhabitants. World population data can help us understand how our population is changing over time. Equations can be used to analyze population data and to suggest possible answers to questions such as population growth rates, birth and death rates, and life expectancy. Population data can also help us understand the kind of resources that are needed to meet the needs of a certain population. It can also help us understand the economic and social trends that are influencing our population. With the help of population data, we can gain insight into the current and future state of our world.