What is a distribution over functions?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
I am reading a textbook Gaussian Process for Machine Learning by C.E. Rasmussen and C.K.I. Williams and I am having some trouble understanding what does distribution over functions mean. In the textbook, an example is given, that one should imagine a function as a very long vector (in fact, it should be infinitely long?). So I imagine a distribution over functions to be a probability distribution drawn "above" such vector values. Would it then be a probability that a function will take this particular value? Or would it be a probability that a function will take a value that is in a given range? Or is distribution over functions a probability assigned to a whole function?
I made (perhaps a bit naive) pictogram to try to visualize this for myself. I am not sure however if such explanation that I made for myself is correct.
Passages from the textbook:
A Gaussian process is a generalization of the Gaussian probability
distribution. Whereas a probability distribution describes random
variables which are scalars or vectors (for multivariate
distributions), a stochastic process governs the properties of
functions. Leaving mathematical sophistication aside, one can loosely
think of a function as a very long vector, each entry in the vector
specifying the function value f(x) at a particular input x. It turns
out, that although this idea is a little naive, it is surprisingly
close what we need. Indeed, the question of how we deal
computationally with these infinite dimensional objects has the most
pleasant resolution imaginable: if you ask only for the properties of
the function at a finite number of points, then inference in the
Gaussian process will give you the same answer if you ignore the
infinitely many other points, as if you would have taken them all into
account!
Chapter 1: Introduction, page 2
There are several ways to interpret Gaussian process (GP) regression
models. One can think of a Gaussian process as defining a distribution
over functions, and inference taking place directly in the space of
functions, the function-space view.
Chapter 2: Regression, page 7
distributions gaussian-process
add a comment |
up vote
2
down vote
favorite
I am reading a textbook Gaussian Process for Machine Learning by C.E. Rasmussen and C.K.I. Williams and I am having some trouble understanding what does distribution over functions mean. In the textbook, an example is given, that one should imagine a function as a very long vector (in fact, it should be infinitely long?). So I imagine a distribution over functions to be a probability distribution drawn "above" such vector values. Would it then be a probability that a function will take this particular value? Or would it be a probability that a function will take a value that is in a given range? Or is distribution over functions a probability assigned to a whole function?
I made (perhaps a bit naive) pictogram to try to visualize this for myself. I am not sure however if such explanation that I made for myself is correct.
Passages from the textbook:
A Gaussian process is a generalization of the Gaussian probability
distribution. Whereas a probability distribution describes random
variables which are scalars or vectors (for multivariate
distributions), a stochastic process governs the properties of
functions. Leaving mathematical sophistication aside, one can loosely
think of a function as a very long vector, each entry in the vector
specifying the function value f(x) at a particular input x. It turns
out, that although this idea is a little naive, it is surprisingly
close what we need. Indeed, the question of how we deal
computationally with these infinite dimensional objects has the most
pleasant resolution imaginable: if you ask only for the properties of
the function at a finite number of points, then inference in the
Gaussian process will give you the same answer if you ignore the
infinitely many other points, as if you would have taken them all into
account!
Chapter 1: Introduction, page 2
There are several ways to interpret Gaussian process (GP) regression
models. One can think of a Gaussian process as defining a distribution
over functions, and inference taking place directly in the space of
functions, the function-space view.
Chapter 2: Regression, page 7
distributions gaussian-process
could you please cite the page/section/subsection of the assertion in the book? the book is pretty large.
– chRrr
1 hour ago
Sure, I added the quotes from the textbook.
– camillejr
1 hour ago
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I am reading a textbook Gaussian Process for Machine Learning by C.E. Rasmussen and C.K.I. Williams and I am having some trouble understanding what does distribution over functions mean. In the textbook, an example is given, that one should imagine a function as a very long vector (in fact, it should be infinitely long?). So I imagine a distribution over functions to be a probability distribution drawn "above" such vector values. Would it then be a probability that a function will take this particular value? Or would it be a probability that a function will take a value that is in a given range? Or is distribution over functions a probability assigned to a whole function?
I made (perhaps a bit naive) pictogram to try to visualize this for myself. I am not sure however if such explanation that I made for myself is correct.
Passages from the textbook:
A Gaussian process is a generalization of the Gaussian probability
distribution. Whereas a probability distribution describes random
variables which are scalars or vectors (for multivariate
distributions), a stochastic process governs the properties of
functions. Leaving mathematical sophistication aside, one can loosely
think of a function as a very long vector, each entry in the vector
specifying the function value f(x) at a particular input x. It turns
out, that although this idea is a little naive, it is surprisingly
close what we need. Indeed, the question of how we deal
computationally with these infinite dimensional objects has the most
pleasant resolution imaginable: if you ask only for the properties of
the function at a finite number of points, then inference in the
Gaussian process will give you the same answer if you ignore the
infinitely many other points, as if you would have taken them all into
account!
Chapter 1: Introduction, page 2
There are several ways to interpret Gaussian process (GP) regression
models. One can think of a Gaussian process as defining a distribution
over functions, and inference taking place directly in the space of
functions, the function-space view.
Chapter 2: Regression, page 7
distributions gaussian-process
I am reading a textbook Gaussian Process for Machine Learning by C.E. Rasmussen and C.K.I. Williams and I am having some trouble understanding what does distribution over functions mean. In the textbook, an example is given, that one should imagine a function as a very long vector (in fact, it should be infinitely long?). So I imagine a distribution over functions to be a probability distribution drawn "above" such vector values. Would it then be a probability that a function will take this particular value? Or would it be a probability that a function will take a value that is in a given range? Or is distribution over functions a probability assigned to a whole function?
I made (perhaps a bit naive) pictogram to try to visualize this for myself. I am not sure however if such explanation that I made for myself is correct.
Passages from the textbook:
A Gaussian process is a generalization of the Gaussian probability
distribution. Whereas a probability distribution describes random
variables which are scalars or vectors (for multivariate
distributions), a stochastic process governs the properties of
functions. Leaving mathematical sophistication aside, one can loosely
think of a function as a very long vector, each entry in the vector
specifying the function value f(x) at a particular input x. It turns
out, that although this idea is a little naive, it is surprisingly
close what we need. Indeed, the question of how we deal
computationally with these infinite dimensional objects has the most
pleasant resolution imaginable: if you ask only for the properties of
the function at a finite number of points, then inference in the
Gaussian process will give you the same answer if you ignore the
infinitely many other points, as if you would have taken them all into
account!
Chapter 1: Introduction, page 2
There are several ways to interpret Gaussian process (GP) regression
models. One can think of a Gaussian process as defining a distribution
over functions, and inference taking place directly in the space of
functions, the function-space view.
Chapter 2: Regression, page 7
distributions gaussian-process
distributions gaussian-process
edited 58 mins ago
asked 2 hours ago


camillejr
215
215
could you please cite the page/section/subsection of the assertion in the book? the book is pretty large.
– chRrr
1 hour ago
Sure, I added the quotes from the textbook.
– camillejr
1 hour ago
add a comment |
could you please cite the page/section/subsection of the assertion in the book? the book is pretty large.
– chRrr
1 hour ago
Sure, I added the quotes from the textbook.
– camillejr
1 hour ago
could you please cite the page/section/subsection of the assertion in the book? the book is pretty large.
– chRrr
1 hour ago
could you please cite the page/section/subsection of the assertion in the book? the book is pretty large.
– chRrr
1 hour ago
Sure, I added the quotes from the textbook.
– camillejr
1 hour ago
Sure, I added the quotes from the textbook.
– camillejr
1 hour ago
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
The concept is a bit more abstract than a usual distribution. The problem is that we are used to the concept of a distribution over $mathbbR$, typically shown as a line, and then expand it to a surface $mathbbR^2$, and so on to distributions over $mathbbR^n$. But the space of functions cannot be represented as a square or a line or a vector. It's not a crime to think of it that way, like you do, but theory that works in $mathbbR^n$, having to do with distance, neighborhoods and such (this is known as the topology of the space), are not the same in the space of functions. So drawing it as a square can give you wrong intuitions about that space.
You can simply think of the space of functions as a big collection of functions, perhaps a bag of things if you will. The distribution here then gives you the probabilities of drawing a subset of those things. The distribution will say: the probability that your next draw (of a function) is in this subset, is, for example, 10%. In the case of a Gaussian process on functions in two dimensions, you might ask, given an x
-coordinate and an interval of y
-values, this is a small vertical line segment, what is the probability that a (random) function will pass through this small line? That's going to be a positive probability. So the Gaussian process specifies a distribution (of probability) over a space of functions. In this example, the subset of the space of functions is the subset that passes through the line segment.
Another confusing naming conventention here is that a distribution is commonly specified by a density function, such as the bell shape with the normal distribution. There, the area under the distribution function tells you how probable an interval is. This doesn't work for all distributions however, and in particular, in the case of functions (not $mathbbR$ as with the normal distributions), this doesn't work at all. That means you won't be able to write this distribution (as specified by the Gaussian process) as a density function.
Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
– camillejr
45 mins ago
1
Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
– Gijs
41 mins ago
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
The concept is a bit more abstract than a usual distribution. The problem is that we are used to the concept of a distribution over $mathbbR$, typically shown as a line, and then expand it to a surface $mathbbR^2$, and so on to distributions over $mathbbR^n$. But the space of functions cannot be represented as a square or a line or a vector. It's not a crime to think of it that way, like you do, but theory that works in $mathbbR^n$, having to do with distance, neighborhoods and such (this is known as the topology of the space), are not the same in the space of functions. So drawing it as a square can give you wrong intuitions about that space.
You can simply think of the space of functions as a big collection of functions, perhaps a bag of things if you will. The distribution here then gives you the probabilities of drawing a subset of those things. The distribution will say: the probability that your next draw (of a function) is in this subset, is, for example, 10%. In the case of a Gaussian process on functions in two dimensions, you might ask, given an x
-coordinate and an interval of y
-values, this is a small vertical line segment, what is the probability that a (random) function will pass through this small line? That's going to be a positive probability. So the Gaussian process specifies a distribution (of probability) over a space of functions. In this example, the subset of the space of functions is the subset that passes through the line segment.
Another confusing naming conventention here is that a distribution is commonly specified by a density function, such as the bell shape with the normal distribution. There, the area under the distribution function tells you how probable an interval is. This doesn't work for all distributions however, and in particular, in the case of functions (not $mathbbR$ as with the normal distributions), this doesn't work at all. That means you won't be able to write this distribution (as specified by the Gaussian process) as a density function.
Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
– camillejr
45 mins ago
1
Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
– Gijs
41 mins ago
add a comment |
up vote
2
down vote
The concept is a bit more abstract than a usual distribution. The problem is that we are used to the concept of a distribution over $mathbbR$, typically shown as a line, and then expand it to a surface $mathbbR^2$, and so on to distributions over $mathbbR^n$. But the space of functions cannot be represented as a square or a line or a vector. It's not a crime to think of it that way, like you do, but theory that works in $mathbbR^n$, having to do with distance, neighborhoods and such (this is known as the topology of the space), are not the same in the space of functions. So drawing it as a square can give you wrong intuitions about that space.
You can simply think of the space of functions as a big collection of functions, perhaps a bag of things if you will. The distribution here then gives you the probabilities of drawing a subset of those things. The distribution will say: the probability that your next draw (of a function) is in this subset, is, for example, 10%. In the case of a Gaussian process on functions in two dimensions, you might ask, given an x
-coordinate and an interval of y
-values, this is a small vertical line segment, what is the probability that a (random) function will pass through this small line? That's going to be a positive probability. So the Gaussian process specifies a distribution (of probability) over a space of functions. In this example, the subset of the space of functions is the subset that passes through the line segment.
Another confusing naming conventention here is that a distribution is commonly specified by a density function, such as the bell shape with the normal distribution. There, the area under the distribution function tells you how probable an interval is. This doesn't work for all distributions however, and in particular, in the case of functions (not $mathbbR$ as with the normal distributions), this doesn't work at all. That means you won't be able to write this distribution (as specified by the Gaussian process) as a density function.
Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
– camillejr
45 mins ago
1
Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
– Gijs
41 mins ago
add a comment |
up vote
2
down vote
up vote
2
down vote
The concept is a bit more abstract than a usual distribution. The problem is that we are used to the concept of a distribution over $mathbbR$, typically shown as a line, and then expand it to a surface $mathbbR^2$, and so on to distributions over $mathbbR^n$. But the space of functions cannot be represented as a square or a line or a vector. It's not a crime to think of it that way, like you do, but theory that works in $mathbbR^n$, having to do with distance, neighborhoods and such (this is known as the topology of the space), are not the same in the space of functions. So drawing it as a square can give you wrong intuitions about that space.
You can simply think of the space of functions as a big collection of functions, perhaps a bag of things if you will. The distribution here then gives you the probabilities of drawing a subset of those things. The distribution will say: the probability that your next draw (of a function) is in this subset, is, for example, 10%. In the case of a Gaussian process on functions in two dimensions, you might ask, given an x
-coordinate and an interval of y
-values, this is a small vertical line segment, what is the probability that a (random) function will pass through this small line? That's going to be a positive probability. So the Gaussian process specifies a distribution (of probability) over a space of functions. In this example, the subset of the space of functions is the subset that passes through the line segment.
Another confusing naming conventention here is that a distribution is commonly specified by a density function, such as the bell shape with the normal distribution. There, the area under the distribution function tells you how probable an interval is. This doesn't work for all distributions however, and in particular, in the case of functions (not $mathbbR$ as with the normal distributions), this doesn't work at all. That means you won't be able to write this distribution (as specified by the Gaussian process) as a density function.
The concept is a bit more abstract than a usual distribution. The problem is that we are used to the concept of a distribution over $mathbbR$, typically shown as a line, and then expand it to a surface $mathbbR^2$, and so on to distributions over $mathbbR^n$. But the space of functions cannot be represented as a square or a line or a vector. It's not a crime to think of it that way, like you do, but theory that works in $mathbbR^n$, having to do with distance, neighborhoods and such (this is known as the topology of the space), are not the same in the space of functions. So drawing it as a square can give you wrong intuitions about that space.
You can simply think of the space of functions as a big collection of functions, perhaps a bag of things if you will. The distribution here then gives you the probabilities of drawing a subset of those things. The distribution will say: the probability that your next draw (of a function) is in this subset, is, for example, 10%. In the case of a Gaussian process on functions in two dimensions, you might ask, given an x
-coordinate and an interval of y
-values, this is a small vertical line segment, what is the probability that a (random) function will pass through this small line? That's going to be a positive probability. So the Gaussian process specifies a distribution (of probability) over a space of functions. In this example, the subset of the space of functions is the subset that passes through the line segment.
Another confusing naming conventention here is that a distribution is commonly specified by a density function, such as the bell shape with the normal distribution. There, the area under the distribution function tells you how probable an interval is. This doesn't work for all distributions however, and in particular, in the case of functions (not $mathbbR$ as with the normal distributions), this doesn't work at all. That means you won't be able to write this distribution (as specified by the Gaussian process) as a density function.
answered 57 mins ago
Gijs
1,399512
1,399512
Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
– camillejr
45 mins ago
1
Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
– Gijs
41 mins ago
add a comment |
Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
– camillejr
45 mins ago
1
Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
– Gijs
41 mins ago
Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
– camillejr
45 mins ago
Thanks, so to clarify, this is not a distribution over one function's values, but instead a distribution over a collection of functions, right? One more question I have: you've said that this would be a probability that a random function will pass through a certain interval, so in example of GPR, it would be a random function but from a specific "family" of functions given by the covariance kernel?
– camillejr
45 mins ago
1
1
Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
– Gijs
41 mins ago
Yes it is a distribution over a collection of functions. The example of passing through an interval applies if you have a Gaussian process. The covariance kernel will actually specify a Gaussian process. So if you know a covariance kernel, you can calculate the probability of a random function passing through a specific interval.
– Gijs
41 mins ago
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f376141%2fwhat-is-a-distribution-over-functions%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
could you please cite the page/section/subsection of the assertion in the book? the book is pretty large.
– chRrr
1 hour ago
Sure, I added the quotes from the textbook.
– camillejr
1 hour ago