Is hypothesis space a random variable?I am reading where it states:The previous statement is true...
Is hypothesis space a random variable?
I am reading where it states:
The previous statement is true for a given training set D. However, remember that D itself is drawn from Pn, and is therefore a random variable. Further, hD is a function of D, and is therefore also a random variable. And we can of course compute its expectation:
The definition of hypothesis space is given in the section Loss Functions.
There are typically two steps involved in learning a hypothesis function h(). First, we select the type of machine learning algorithm that we think is appropriate for this particular learning problem. This defines the hypothesis class H, i.e. the set of functions we can possibly learn. The second step is to find the best function within this class, . This second step is the actual learning process and often, but not always, involves an optimization problem. Essentially, we try to find a function h within the hypothesis class that makes the fewest mistakes within our training data. (If there is not a single function we typically try to choose the "simplest" by some notion of simplicity - but we will cover this in more detail in a later class.) How can we find the best function? For this we need some way to evaluate what it means for one function to be better than another. This is where the loss function (aka risk function) comes in. A loss function evaluates a hypothesis on our training data and tells us how bad it is. The higher the loss, the worse it is - a loss of zero means it makes perfect predictions. It is common practice to normalize the loss by the total number of training samples, n, so that the output can be interpreted as the average loss per sample (and is independent of n ).
This is how I would rewrite the document:
We have a table where the last column are targets (predictions). This can be represented as a multivariate random variable or random vector X and a target random variable Y. The data distribution of the random variable D representing our dataset is: P(X,Y). When concrete dataset D is drawn from sample space Pn we get our dataset D with n rows.
But I am not sure of h? It is inside the "Loss Function" paragraph where actually to me it reflects the machine learning algorithm function. One of the infinitely many. Further hypothesis class may be a random variable H, and h just a realization.
1. How can dataset D be a random variable?In here when they refer to a dataset they use D.
2. How can hD be a random variable if this is a function associated to a concrete dataset, and most likely it means a function tryout (one of the many) that translate the input to the output?