Shouldn't the probability of sampling a point from a continuous distribution be 0? Hey I was readin

Lena Bell 2022-07-07 Answered
Shouldn't the probability of sampling a point from a continuous distribution be 0?
Hey I was reading about Gaussian EM algorithm in which you first calculate the likelihood of data points being sampled from a Gaussian and then adjust your mean and variance to maximize it. To calculate the probability of a point being sampled from a distribution we do
P ( x i θ ) = 1 2 π σ 2 exp ( ( x i μ ) 2 2 σ 2 )
But how can you sample a point from a continuous distribution? Shouldn't this be zero? Moreover I read many times about sampling a point/data from some continuous distribution but I can't understand how could you do that as for continuous random variable X the P ( X = x 1 ) = 0, so how could yo sample data point(s) from a continuous distribution?
Or does this sampling have some other meaning. I've seen many questions on this platform but I couldn't get my answer.
You can still ask an expert for help

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

Solve your problem for the price of one coffee

  • Available 24/7
  • Math expert for every subject
  • Pay only if we can solve it
Ask Question

Answers (1)

Karissa Macdonald
Answered 2022-07-08 Author has 12 answers
Decide on the level of accuracy you are working to. For example, if you are modelling masses, you might have the masses being measured to the nearest 5g. That means a measured value of say 45g would in fact be a value between 42.5g and 47.5g. You can then work out the probability that a value is between 42.5g and 47.5g.
Did you like this example?
Subscribe for all access

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

You might be interested in

asked 2021-02-23
Interpreting z-scores: Complete the following statements using your knowledge about z-scores.
a. If the data is weight, the z-score for someone who is overweight would be
-positive
-negative
-zero
b. If the data is IQ test scores, an individual with a negative z-score would have a
-high IQ
-low IQ
-average IQ
c. If the data is time spent watching TV, an individual with a z-score of zero would
-watch very little TV
-watch a lot of TV
-watch the average amount of TV
d. If the data is annual salary in the U.S and the population is all legally employed people in the U.S., the z-scores of people who make minimum wage would be
-positive
-negative
-zero
asked 2022-07-06
What does x% colder mean?
I see people using percentage increases to talk about temperature; for example
"Two weather predictions were presented with somewhat conflicting data, with the Energy Information Administration (EIA) predicting the 2007-08 winter to be 4 percent colder than 2006-07, but still 2 percent warmer than the 30-year average."
Is there any meaningful way to interpret this? I don't know what it means for one year to be 4 percent colder than another.
asked 2020-10-28
Make a box-and-whisker plot for each data set.
Area in 1,000 mi2 of 13 western states.
122, 164, 71, 98, 84, 147, 114, 111, 98, 85, 104, 71, 77
median: __________
lower quartile: ________
upper quartile: _________
asked 2022-06-20
how to interpret the SVD of a data matrix A
So I understand the proofs behind Singular Value Decomposition but I'm having trouble interpreting it in the context of a real world problem.
Specifically, If I'm given an m × n data matrix A, where we have m training examples and n features collected for each example, I'm having trouble understanding the meaning behind A v j = σ j u j where L A (left multiplication by A) is our linear transformation and β = { v 1 , v 2 , . . . , v n } is an orthonormal basis for F n and γ = { u 1 , u 2 , . . . , u m } is an orthonormal basis for F m .
From reading various posts and articles, the idea seems to be a larger σ indicates more variation in the data along that vector j while a smaller variation in a certain direction j is captured with a smaller σ.
However, when we are looking at σ i'm not sure why we care about what A is doing. After all, this relationship would be great if I wanted to see what A does when it acts on a orthonormal basis β but A is just a data matrix so i'm not sure how to interpret the range of a data matrix or what types of transformations A is making when presented a vector x to 'do' left multiplication on.
asked 2022-06-04
Why are zeros/roots (real) solutions to an equation of an n-degree polynomial?
I can't really put a proper title on this one, but I seem to be missing one crucial point. Why do roots of a function like f ( x ) = a x 2 + b x + c provide the solutions when f ( x ) = 0. What does that y=0 mean for the solutions, the intercept at the x axis? Why aren't the solutions at f ( x ) = 403045 or some other arbitrary n?
What makes the x-intercept special?
asked 2022-06-12
How to compute Bias and Variance for the given scenarios?
I'm currently studying the "Learning from data" course - by Professor Yaser Abu, and I do not get the "bias-variance tradeoff" part of it. Actually, the concepts are fine − the math is the problem.
In the lecture 08, he defined bias and variance as follows:
Bias = E x [ ( g ¯ ( x ) f ( x ) ) 2 ] , where g ¯ ( x ) = E D [ g ( D ) ( x ) ]
Var = E x [ E D [ ( g ( D ) ( x ) g ¯ ( x ) ) 2 ] ]
To clarify the notation:
D means the data set ( x 1 , y 1 ) , , ( x n , y n ).
g is the function that approximates f; i.e., I'm estimating f by using g. In this case, g is chosen by an algorithm A in the hypothesis set H .
After that, he proposed an example that was stated in the following manner:
Example: Let f ( x ) = sin ( π x ) and a data set D of size N=2. We sample x uniformly in [−1,1] to generate ( x 1 , y 1 ) and ( x 2 , y 2 ). Now, suppose that I have two models, H 0 and H 1 .
H 0 : h ( x ) = b
H 1 : h ( x ) = a x + b
H 0 : h ( x ) = b
H 1 : h ( x ) = a x + b
For H 0 , let b = y 1 + y 2 2 . For H 1 , choose the line that passes through ( x 1 , y 1 ) and ( x 2 , y 2 ).
Simulating the process as described, he states that:
Looking for H 0 , Bias 0.50 and Var 0.25.
Looking for H 1 , Bias 0.21 and Var 1.69.
Here is my main question: How can one get these results analytically?
I've tried to solve the integrals (it didn't work) that came from the E [ ], but I'm not sure if
I'm interpreting in the right way which distribution is which. For example, how to evaluate E D [ g ( D ) ( x ) ] (it is the same as evaluating E D [ b ] or E D [ a x + b ] , for H 0 and H 1 , respectively, right?)? The random variable which has uniform distribution over [−1,1] is x, right? Thus
E x [ ] is evaluated with respect to a random variable that follows U [ 1 , 1 ]] distribution, right?
If anyone could help me to understand at least one of the two scenarios, by achieving the provided numbers for the Bias and Var quantities; it would be extremely helpful.
Thanks in advance,
André
asked 2021-02-12
Michelle is studying the relationship between the hours worked (per week) and time spent reading (per day) and has collected the data shown in the table. The line of best fit for the data is y=0.79x+98.8.
Hours Worked (per week) 30405060 Minutes Reading (per day) 75685852
(a) According to the line of best fit, the predicted number of minutes spent reading for a person who works 27 hours (per week) is 77.47.
(b) Is it reasonable to use this line of best fit to make the above prediction?
Select the correct answer below:
1.The estimate, a predicted time of 77.47 minutes, is unreliable but reasonable.
2.The estimate, a predicted time of 77.47 minutes, is reliable but unreasonable.
3.The estimate, a predicted time of 77.47 minutes, is both unreliable and unreasonable.
4.The estimate, a predicted time of 77.47 minutes, is both reliable and reasonable.