Is using both training and test sets for hyperparameter tuning overfitting?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
You have a training and a test set. You combine them and do something like GridSearch to decide the hyperparameters of the model. Then, you fit a model on the training set using these hyperparameters, and you use the test set to evaluate it.
Is this overfitting ? Ultimately, the data was not fitted on the test set, but the test set was considered when deciding hyperparameters.
machine-learning cross-validation overfitting
add a comment |Â
up vote
1
down vote
favorite
You have a training and a test set. You combine them and do something like GridSearch to decide the hyperparameters of the model. Then, you fit a model on the training set using these hyperparameters, and you use the test set to evaluate it.
Is this overfitting ? Ultimately, the data was not fitted on the test set, but the test set was considered when deciding hyperparameters.
machine-learning cross-validation overfitting
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
You have a training and a test set. You combine them and do something like GridSearch to decide the hyperparameters of the model. Then, you fit a model on the training set using these hyperparameters, and you use the test set to evaluate it.
Is this overfitting ? Ultimately, the data was not fitted on the test set, but the test set was considered when deciding hyperparameters.
machine-learning cross-validation overfitting
You have a training and a test set. You combine them and do something like GridSearch to decide the hyperparameters of the model. Then, you fit a model on the training set using these hyperparameters, and you use the test set to evaluate it.
Is this overfitting ? Ultimately, the data was not fitted on the test set, but the test set was considered when deciding hyperparameters.
machine-learning cross-validation overfitting
machine-learning cross-validation overfitting
edited 5 hours ago
asked 6 hours ago


FranGoitia
1112
1112
add a comment |Â
add a comment |Â
4 Answers
4
active
oldest
votes
up vote
2
down vote
It is an "in-sample" forecast since you eventually make the forecast on observations that are already part of your training set. Why not use n-fold cross-validation? By doing that, at each time, you are making "out-out" sample forecast, in which test set and training set are separate.
New contributor
Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
2
down vote
Yes, you are overfitting. The test set should be used only for testing, not for parameter tuning. Searching for parameters on the test set will learn the rules that are present in the test set, and eventually overfit it.
add a comment |Â
up vote
2
down vote
The idea behind holdout and cross validation is to estimate the generalization performance of a learning algorithm--that is, the expected performance on future data drawn from the same distribution as the training data. This can be used to tune hyperparameters or report the final performance. The validity of this estimate depends on the independence of the data used for training and estimating performance. If this independence is violated, the performance estimate will be overoptimistically biased. The most egregious way this can happen is by estimating performance on data that has already been used for training or hyperpameter tuning, but there are many more subtle and insidious ways too.
The procedure you asked about goes wrong in multiple ways. First, the same data is used for both training and hyperpameter tuning. The goal of hyperparameter tuning is to select hyperparameters that will give good generalization performance. Typically, this works by estimating the generalization performance for different choices of hyperparameters (e.g. using a validation set), and then choosing the best. But, as above, this estimate will be overoptimistic if the same data has been used for training. The consequence is that sub-optimal hyperparameters will be chosen. In particular, there will be a bias toward high capacity models that will overfit.
Second, data that has already been used to tune hyperparameters is being re-used to estimate performance. This will give a deceptive estimate, as above. This isn't overfitting itself but it means that, if overfitting is happening (and it probably is, as above), then you won't know it.
The remedy is to use three separate datasets: a training set for training, a validation set for hyperparameter tuning, and a test set for estimating the final performance. Or, use nested cross validation, which will give better estimates, and is necessary if there isn't enough data.
add a comment |Â
up vote
1
down vote
I would say you are not necessarily overfitting, because overfitting is a term that is normally used to indicate that your model does not generalise well. E.g. If you would be doing linear regression on something like MNIST images, you are probably still underfitting (it does not generalise enough) when training on both training and test data.
What you are doing, however, is still not a good thing. The test set is normally a part of the data that you want to use to check how good the final, trained model will perform on data it has never seen before. If you use this data to choose hyperparameters, you actually give the model a chance to "see" the test data and to develop a bias towards this test data. Therefore, you actually lose the possibility to find out how good your model would actually be on unseen data (because it has already seen the test data).
It might be possible that you do not really care about how well your model performs, but then you would not need a test set either. Because in most scenarios you do want to have an idea how good a model is, it is best to lock the test data away before you start doing anything with the data. Something as little as using test data during pre-processing, will probably lead to a biased model.
Now you might be asking yourself: "How should I find hyperparameters then?". The easiest way would be to split the available data (assuming that you already safely put away some data for testing) into a training set and a so-called validation set. If you have little data to work with, it probably makes more sense to take a look at cross validation
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
It is an "in-sample" forecast since you eventually make the forecast on observations that are already part of your training set. Why not use n-fold cross-validation? By doing that, at each time, you are making "out-out" sample forecast, in which test set and training set are separate.
New contributor
Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
2
down vote
It is an "in-sample" forecast since you eventually make the forecast on observations that are already part of your training set. Why not use n-fold cross-validation? By doing that, at each time, you are making "out-out" sample forecast, in which test set and training set are separate.
New contributor
Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
It is an "in-sample" forecast since you eventually make the forecast on observations that are already part of your training set. Why not use n-fold cross-validation? By doing that, at each time, you are making "out-out" sample forecast, in which test set and training set are separate.
New contributor
Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
It is an "in-sample" forecast since you eventually make the forecast on observations that are already part of your training set. Why not use n-fold cross-validation? By doing that, at each time, you are making "out-out" sample forecast, in which test set and training set are separate.
New contributor
Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 3 hours ago


Ray Yang
564
564
New contributor
Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
add a comment |Â
up vote
2
down vote
Yes, you are overfitting. The test set should be used only for testing, not for parameter tuning. Searching for parameters on the test set will learn the rules that are present in the test set, and eventually overfit it.
add a comment |Â
up vote
2
down vote
Yes, you are overfitting. The test set should be used only for testing, not for parameter tuning. Searching for parameters on the test set will learn the rules that are present in the test set, and eventually overfit it.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Yes, you are overfitting. The test set should be used only for testing, not for parameter tuning. Searching for parameters on the test set will learn the rules that are present in the test set, and eventually overfit it.
Yes, you are overfitting. The test set should be used only for testing, not for parameter tuning. Searching for parameters on the test set will learn the rules that are present in the test set, and eventually overfit it.
answered 1 hour ago
user2974951
15810
15810
add a comment |Â
add a comment |Â
up vote
2
down vote
The idea behind holdout and cross validation is to estimate the generalization performance of a learning algorithm--that is, the expected performance on future data drawn from the same distribution as the training data. This can be used to tune hyperparameters or report the final performance. The validity of this estimate depends on the independence of the data used for training and estimating performance. If this independence is violated, the performance estimate will be overoptimistically biased. The most egregious way this can happen is by estimating performance on data that has already been used for training or hyperpameter tuning, but there are many more subtle and insidious ways too.
The procedure you asked about goes wrong in multiple ways. First, the same data is used for both training and hyperpameter tuning. The goal of hyperparameter tuning is to select hyperparameters that will give good generalization performance. Typically, this works by estimating the generalization performance for different choices of hyperparameters (e.g. using a validation set), and then choosing the best. But, as above, this estimate will be overoptimistic if the same data has been used for training. The consequence is that sub-optimal hyperparameters will be chosen. In particular, there will be a bias toward high capacity models that will overfit.
Second, data that has already been used to tune hyperparameters is being re-used to estimate performance. This will give a deceptive estimate, as above. This isn't overfitting itself but it means that, if overfitting is happening (and it probably is, as above), then you won't know it.
The remedy is to use three separate datasets: a training set for training, a validation set for hyperparameter tuning, and a test set for estimating the final performance. Or, use nested cross validation, which will give better estimates, and is necessary if there isn't enough data.
add a comment |Â
up vote
2
down vote
The idea behind holdout and cross validation is to estimate the generalization performance of a learning algorithm--that is, the expected performance on future data drawn from the same distribution as the training data. This can be used to tune hyperparameters or report the final performance. The validity of this estimate depends on the independence of the data used for training and estimating performance. If this independence is violated, the performance estimate will be overoptimistically biased. The most egregious way this can happen is by estimating performance on data that has already been used for training or hyperpameter tuning, but there are many more subtle and insidious ways too.
The procedure you asked about goes wrong in multiple ways. First, the same data is used for both training and hyperpameter tuning. The goal of hyperparameter tuning is to select hyperparameters that will give good generalization performance. Typically, this works by estimating the generalization performance for different choices of hyperparameters (e.g. using a validation set), and then choosing the best. But, as above, this estimate will be overoptimistic if the same data has been used for training. The consequence is that sub-optimal hyperparameters will be chosen. In particular, there will be a bias toward high capacity models that will overfit.
Second, data that has already been used to tune hyperparameters is being re-used to estimate performance. This will give a deceptive estimate, as above. This isn't overfitting itself but it means that, if overfitting is happening (and it probably is, as above), then you won't know it.
The remedy is to use three separate datasets: a training set for training, a validation set for hyperparameter tuning, and a test set for estimating the final performance. Or, use nested cross validation, which will give better estimates, and is necessary if there isn't enough data.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
The idea behind holdout and cross validation is to estimate the generalization performance of a learning algorithm--that is, the expected performance on future data drawn from the same distribution as the training data. This can be used to tune hyperparameters or report the final performance. The validity of this estimate depends on the independence of the data used for training and estimating performance. If this independence is violated, the performance estimate will be overoptimistically biased. The most egregious way this can happen is by estimating performance on data that has already been used for training or hyperpameter tuning, but there are many more subtle and insidious ways too.
The procedure you asked about goes wrong in multiple ways. First, the same data is used for both training and hyperpameter tuning. The goal of hyperparameter tuning is to select hyperparameters that will give good generalization performance. Typically, this works by estimating the generalization performance for different choices of hyperparameters (e.g. using a validation set), and then choosing the best. But, as above, this estimate will be overoptimistic if the same data has been used for training. The consequence is that sub-optimal hyperparameters will be chosen. In particular, there will be a bias toward high capacity models that will overfit.
Second, data that has already been used to tune hyperparameters is being re-used to estimate performance. This will give a deceptive estimate, as above. This isn't overfitting itself but it means that, if overfitting is happening (and it probably is, as above), then you won't know it.
The remedy is to use three separate datasets: a training set for training, a validation set for hyperparameter tuning, and a test set for estimating the final performance. Or, use nested cross validation, which will give better estimates, and is necessary if there isn't enough data.
The idea behind holdout and cross validation is to estimate the generalization performance of a learning algorithm--that is, the expected performance on future data drawn from the same distribution as the training data. This can be used to tune hyperparameters or report the final performance. The validity of this estimate depends on the independence of the data used for training and estimating performance. If this independence is violated, the performance estimate will be overoptimistically biased. The most egregious way this can happen is by estimating performance on data that has already been used for training or hyperpameter tuning, but there are many more subtle and insidious ways too.
The procedure you asked about goes wrong in multiple ways. First, the same data is used for both training and hyperpameter tuning. The goal of hyperparameter tuning is to select hyperparameters that will give good generalization performance. Typically, this works by estimating the generalization performance for different choices of hyperparameters (e.g. using a validation set), and then choosing the best. But, as above, this estimate will be overoptimistic if the same data has been used for training. The consequence is that sub-optimal hyperparameters will be chosen. In particular, there will be a bias toward high capacity models that will overfit.
Second, data that has already been used to tune hyperparameters is being re-used to estimate performance. This will give a deceptive estimate, as above. This isn't overfitting itself but it means that, if overfitting is happening (and it probably is, as above), then you won't know it.
The remedy is to use three separate datasets: a training set for training, a validation set for hyperparameter tuning, and a test set for estimating the final performance. Or, use nested cross validation, which will give better estimates, and is necessary if there isn't enough data.
answered 1 hour ago
user20160
13.7k12250
13.7k12250
add a comment |Â
add a comment |Â
up vote
1
down vote
I would say you are not necessarily overfitting, because overfitting is a term that is normally used to indicate that your model does not generalise well. E.g. If you would be doing linear regression on something like MNIST images, you are probably still underfitting (it does not generalise enough) when training on both training and test data.
What you are doing, however, is still not a good thing. The test set is normally a part of the data that you want to use to check how good the final, trained model will perform on data it has never seen before. If you use this data to choose hyperparameters, you actually give the model a chance to "see" the test data and to develop a bias towards this test data. Therefore, you actually lose the possibility to find out how good your model would actually be on unseen data (because it has already seen the test data).
It might be possible that you do not really care about how well your model performs, but then you would not need a test set either. Because in most scenarios you do want to have an idea how good a model is, it is best to lock the test data away before you start doing anything with the data. Something as little as using test data during pre-processing, will probably lead to a biased model.
Now you might be asking yourself: "How should I find hyperparameters then?". The easiest way would be to split the available data (assuming that you already safely put away some data for testing) into a training set and a so-called validation set. If you have little data to work with, it probably makes more sense to take a look at cross validation
add a comment |Â
up vote
1
down vote
I would say you are not necessarily overfitting, because overfitting is a term that is normally used to indicate that your model does not generalise well. E.g. If you would be doing linear regression on something like MNIST images, you are probably still underfitting (it does not generalise enough) when training on both training and test data.
What you are doing, however, is still not a good thing. The test set is normally a part of the data that you want to use to check how good the final, trained model will perform on data it has never seen before. If you use this data to choose hyperparameters, you actually give the model a chance to "see" the test data and to develop a bias towards this test data. Therefore, you actually lose the possibility to find out how good your model would actually be on unseen data (because it has already seen the test data).
It might be possible that you do not really care about how well your model performs, but then you would not need a test set either. Because in most scenarios you do want to have an idea how good a model is, it is best to lock the test data away before you start doing anything with the data. Something as little as using test data during pre-processing, will probably lead to a biased model.
Now you might be asking yourself: "How should I find hyperparameters then?". The easiest way would be to split the available data (assuming that you already safely put away some data for testing) into a training set and a so-called validation set. If you have little data to work with, it probably makes more sense to take a look at cross validation
add a comment |Â
up vote
1
down vote
up vote
1
down vote
I would say you are not necessarily overfitting, because overfitting is a term that is normally used to indicate that your model does not generalise well. E.g. If you would be doing linear regression on something like MNIST images, you are probably still underfitting (it does not generalise enough) when training on both training and test data.
What you are doing, however, is still not a good thing. The test set is normally a part of the data that you want to use to check how good the final, trained model will perform on data it has never seen before. If you use this data to choose hyperparameters, you actually give the model a chance to "see" the test data and to develop a bias towards this test data. Therefore, you actually lose the possibility to find out how good your model would actually be on unseen data (because it has already seen the test data).
It might be possible that you do not really care about how well your model performs, but then you would not need a test set either. Because in most scenarios you do want to have an idea how good a model is, it is best to lock the test data away before you start doing anything with the data. Something as little as using test data during pre-processing, will probably lead to a biased model.
Now you might be asking yourself: "How should I find hyperparameters then?". The easiest way would be to split the available data (assuming that you already safely put away some data for testing) into a training set and a so-called validation set. If you have little data to work with, it probably makes more sense to take a look at cross validation
I would say you are not necessarily overfitting, because overfitting is a term that is normally used to indicate that your model does not generalise well. E.g. If you would be doing linear regression on something like MNIST images, you are probably still underfitting (it does not generalise enough) when training on both training and test data.
What you are doing, however, is still not a good thing. The test set is normally a part of the data that you want to use to check how good the final, trained model will perform on data it has never seen before. If you use this data to choose hyperparameters, you actually give the model a chance to "see" the test data and to develop a bias towards this test data. Therefore, you actually lose the possibility to find out how good your model would actually be on unseen data (because it has already seen the test data).
It might be possible that you do not really care about how well your model performs, but then you would not need a test set either. Because in most scenarios you do want to have an idea how good a model is, it is best to lock the test data away before you start doing anything with the data. Something as little as using test data during pre-processing, will probably lead to a biased model.
Now you might be asking yourself: "How should I find hyperparameters then?". The easiest way would be to split the available data (assuming that you already safely put away some data for testing) into a training set and a so-called validation set. If you have little data to work with, it probably makes more sense to take a look at cross validation
answered 1 hour ago
Mr Tsjolder
470315
470315
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f366862%2fis-using-both-training-and-test-sets-for-hyperparameter-tuning-overfitting%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password