Janet Forbes

Answered

2022-07-09

Is hypothesis space a random variable?

I am reading where it states:

The previous statement is true for a given training set D. However, remember that D itself is drawn from Pn, and is therefore a random variable. Further, hD is a function of D, and is therefore also a random variable. And we can of course compute its expectation:

The definition of hypothesis space is given in the section Loss Functions.

There are typically two steps involved in learning a hypothesis function h(). First, we select the type of machine learning algorithm that we think is appropriate for this particular learning problem. This defines the hypothesis class H, i.e. the set of functions we can possibly learn. The second step is to find the best function within this class, $h\in \mathcal{H}$. This second step is the actual learning process and often, but not always, involves an optimization problem. Essentially, we try to find a function h within the hypothesis class that makes the fewest mistakes within our training data. (If there is not a single function we typically try to choose the "simplest" by some notion of simplicity - but we will cover this in more detail in a later class.) How can we find the best function? For this we need some way to evaluate what it means for one function to be better than another. This is where the loss function (aka risk function) comes in. A loss function evaluates a hypothesis $h\in \mathcal{H}$ on our training data and tells us how bad it is. The higher the loss, the worse it is - a loss of zero means it makes perfect predictions. It is common practice to normalize the loss by the total number of training samples, n, so that the output can be interpreted as the average loss per sample (and is independent of n ).

This is how I would rewrite the document:

We have a table where the last column are targets (predictions). This can be represented as a multivariate random variable or random vector X and a target random variable Y. The data distribution of the random variable D representing our dataset is: P(X,Y). When concrete dataset D is drawn from sample space Pn we get our dataset D with n rows.

But I am not sure of h? It is inside the "Loss Function" paragraph where actually to me it reflects the machine learning algorithm function. One of the infinitely many. Further hypothesis class may be a random variable H, and h just a realization.

1. How can dataset D be a random variable?In here when they refer to a dataset they use D.

2. How can hD be a random variable if this is a function associated to a concrete dataset, and most likely it means a function tryout (one of the many) that translate the input to the output?

I am reading where it states:

The previous statement is true for a given training set D. However, remember that D itself is drawn from Pn, and is therefore a random variable. Further, hD is a function of D, and is therefore also a random variable. And we can of course compute its expectation:

The definition of hypothesis space is given in the section Loss Functions.

There are typically two steps involved in learning a hypothesis function h(). First, we select the type of machine learning algorithm that we think is appropriate for this particular learning problem. This defines the hypothesis class H, i.e. the set of functions we can possibly learn. The second step is to find the best function within this class, $h\in \mathcal{H}$. This second step is the actual learning process and often, but not always, involves an optimization problem. Essentially, we try to find a function h within the hypothesis class that makes the fewest mistakes within our training data. (If there is not a single function we typically try to choose the "simplest" by some notion of simplicity - but we will cover this in more detail in a later class.) How can we find the best function? For this we need some way to evaluate what it means for one function to be better than another. This is where the loss function (aka risk function) comes in. A loss function evaluates a hypothesis $h\in \mathcal{H}$ on our training data and tells us how bad it is. The higher the loss, the worse it is - a loss of zero means it makes perfect predictions. It is common practice to normalize the loss by the total number of training samples, n, so that the output can be interpreted as the average loss per sample (and is independent of n ).

This is how I would rewrite the document:

We have a table where the last column are targets (predictions). This can be represented as a multivariate random variable or random vector X and a target random variable Y. The data distribution of the random variable D representing our dataset is: P(X,Y). When concrete dataset D is drawn from sample space Pn we get our dataset D with n rows.

But I am not sure of h? It is inside the "Loss Function" paragraph where actually to me it reflects the machine learning algorithm function. One of the infinitely many. Further hypothesis class may be a random variable H, and h just a realization.

1. How can dataset D be a random variable?In here when they refer to a dataset they use D.

2. How can hD be a random variable if this is a function associated to a concrete dataset, and most likely it means a function tryout (one of the many) that translate the input to the output?

Answer & Explanation

Savion Stanton

Expert

2022-07-10Added 10 answers

I will summarise you what is a learning task. There is a set of features X which you can observe and a set of features Y which you have to guess. And there is a probability distribution P(X,Y) which is constant but unknown. We can make some assumptions about this distribution but it's not important now.

You are looking for a function $h:X\to Y$ which guesses the feature y only based on the feature x. There are many such functions and you are looking for the function h with minimal loss. You can either search in all possible functions or you restrict yourself for functions in a so-called hypothesis class H.

If your function h guesses y from x wrongly, then the bigger P(x,y) is, the bigger impact it has on the loss. To calculate the loss, you actually need P(X,Y), which you don't have. You can only estimate it from the dataset. However, the loss you calculate is also only an estimation.

The doubles (x,y) taken independently from P(X,Y) are random variables. So a list of these doubles, which is called dataset D, is also a random variable. The functions in the hypothesis space H are not random variables. However, their estimations of losses on the dataset D are random variables. And when you are looking for the minimum of random variables, it is also a random variable.

And the argument of minimum, the function h${h}_{D}$ you choose, may not be the same function as you would choose if you knew the distribution P(X,Y). So it is actually also a random variable.

Is it more clear now? And if not, can you please tell me which paragraph you don't understand?

You are looking for a function $h:X\to Y$ which guesses the feature y only based on the feature x. There are many such functions and you are looking for the function h with minimal loss. You can either search in all possible functions or you restrict yourself for functions in a so-called hypothesis class H.

If your function h guesses y from x wrongly, then the bigger P(x,y) is, the bigger impact it has on the loss. To calculate the loss, you actually need P(X,Y), which you don't have. You can only estimate it from the dataset. However, the loss you calculate is also only an estimation.

The doubles (x,y) taken independently from P(X,Y) are random variables. So a list of these doubles, which is called dataset D, is also a random variable. The functions in the hypothesis space H are not random variables. However, their estimations of losses on the dataset D are random variables. And when you are looking for the minimum of random variables, it is also a random variable.

And the argument of minimum, the function h${h}_{D}$ you choose, may not be the same function as you would choose if you knew the distribution P(X,Y). So it is actually also a random variable.

Is it more clear now? And if not, can you please tell me which paragraph you don't understand?

Most Popular Questions