Is using both training and test sets for hyperparameter tuning overfitting?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite
2












You have a training and a test set. You combine them and do something like GridSearch to decide the hyperparameters of the model. Then, you fit a model on the training set using these hyperparameters, and you use the test set to evaluate it.



Is this overfitting ? Ultimately, the data was not fitted on the test set, but the test set was considered when deciding hyperparameters.










share|cite|improve this question





























    up vote
    1
    down vote

    favorite
    2












    You have a training and a test set. You combine them and do something like GridSearch to decide the hyperparameters of the model. Then, you fit a model on the training set using these hyperparameters, and you use the test set to evaluate it.



    Is this overfitting ? Ultimately, the data was not fitted on the test set, but the test set was considered when deciding hyperparameters.










    share|cite|improve this question

























      up vote
      1
      down vote

      favorite
      2









      up vote
      1
      down vote

      favorite
      2






      2





      You have a training and a test set. You combine them and do something like GridSearch to decide the hyperparameters of the model. Then, you fit a model on the training set using these hyperparameters, and you use the test set to evaluate it.



      Is this overfitting ? Ultimately, the data was not fitted on the test set, but the test set was considered when deciding hyperparameters.










      share|cite|improve this question















      You have a training and a test set. You combine them and do something like GridSearch to decide the hyperparameters of the model. Then, you fit a model on the training set using these hyperparameters, and you use the test set to evaluate it.



      Is this overfitting ? Ultimately, the data was not fitted on the test set, but the test set was considered when deciding hyperparameters.







      machine-learning cross-validation overfitting






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited 5 hours ago

























      asked 6 hours ago









      FranGoitia

      1112




      1112




















          4 Answers
          4






          active

          oldest

          votes

















          up vote
          2
          down vote













          It is an "in-sample" forecast since you eventually make the forecast on observations that are already part of your training set. Why not use n-fold cross-validation? By doing that, at each time, you are making "out-out" sample forecast, in which test set and training set are separate.






          share|cite|improve this answer








          New contributor




          Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.
























            up vote
            2
            down vote













            Yes, you are overfitting. The test set should be used only for testing, not for parameter tuning. Searching for parameters on the test set will learn the rules that are present in the test set, and eventually overfit it.






            share|cite|improve this answer



























              up vote
              2
              down vote













              The idea behind holdout and cross validation is to estimate the generalization performance of a learning algorithm--that is, the expected performance on future data drawn from the same distribution as the training data. This can be used to tune hyperparameters or report the final performance. The validity of this estimate depends on the independence of the data used for training and estimating performance. If this independence is violated, the performance estimate will be overoptimistically biased. The most egregious way this can happen is by estimating performance on data that has already been used for training or hyperpameter tuning, but there are many more subtle and insidious ways too.



              The procedure you asked about goes wrong in multiple ways. First, the same data is used for both training and hyperpameter tuning. The goal of hyperparameter tuning is to select hyperparameters that will give good generalization performance. Typically, this works by estimating the generalization performance for different choices of hyperparameters (e.g. using a validation set), and then choosing the best. But, as above, this estimate will be overoptimistic if the same data has been used for training. The consequence is that sub-optimal hyperparameters will be chosen. In particular, there will be a bias toward high capacity models that will overfit.



              Second, data that has already been used to tune hyperparameters is being re-used to estimate performance. This will give a deceptive estimate, as above. This isn't overfitting itself but it means that, if overfitting is happening (and it probably is, as above), then you won't know it.



              The remedy is to use three separate datasets: a training set for training, a validation set for hyperparameter tuning, and a test set for estimating the final performance. Or, use nested cross validation, which will give better estimates, and is necessary if there isn't enough data.






              share|cite|improve this answer



























                up vote
                1
                down vote













                I would say you are not necessarily overfitting, because overfitting is a term that is normally used to indicate that your model does not generalise well. E.g. If you would be doing linear regression on something like MNIST images, you are probably still underfitting (it does not generalise enough) when training on both training and test data.



                What you are doing, however, is still not a good thing. The test set is normally a part of the data that you want to use to check how good the final, trained model will perform on data it has never seen before. If you use this data to choose hyperparameters, you actually give the model a chance to "see" the test data and to develop a bias towards this test data. Therefore, you actually lose the possibility to find out how good your model would actually be on unseen data (because it has already seen the test data).



                It might be possible that you do not really care about how well your model performs, but then you would not need a test set either. Because in most scenarios you do want to have an idea how good a model is, it is best to lock the test data away before you start doing anything with the data. Something as little as using test data during pre-processing, will probably lead to a biased model.



                Now you might be asking yourself: "How should I find hyperparameters then?". The easiest way would be to split the available data (assuming that you already safely put away some data for testing) into a training set and a so-called validation set. If you have little data to work with, it probably makes more sense to take a look at cross validation






                share|cite|improve this answer




















                  Your Answer




                  StackExchange.ifUsing("editor", function ()
                  return StackExchange.using("mathjaxEditing", function ()
                  StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
                  StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                  );
                  );
                  , "mathjax-editing");

                  StackExchange.ready(function()
                  var channelOptions =
                  tags: "".split(" "),
                  id: "65"
                  ;
                  initTagRenderer("".split(" "), "".split(" "), channelOptions);

                  StackExchange.using("externalEditor", function()
                  // Have to fire editor after snippets, if snippets enabled
                  if (StackExchange.settings.snippets.snippetsEnabled)
                  StackExchange.using("snippets", function()
                  createEditor();
                  );

                  else
                  createEditor();

                  );

                  function createEditor()
                  StackExchange.prepareEditor(
                  heartbeatType: 'answer',
                  convertImagesToLinks: false,
                  noModals: false,
                  showLowRepImageUploadWarning: true,
                  reputationToPostImages: null,
                  bindNavPrevention: true,
                  postfix: "",
                  onDemand: true,
                  discardSelector: ".discard-answer"
                  ,immediatelyShowMarkdownHelp:true
                  );



                  );













                   

                  draft saved


                  draft discarded


















                  StackExchange.ready(
                  function ()
                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f366862%2fis-using-both-training-and-test-sets-for-hyperparameter-tuning-overfitting%23new-answer', 'question_page');

                  );

                  Post as a guest






























                  4 Answers
                  4






                  active

                  oldest

                  votes








                  4 Answers
                  4






                  active

                  oldest

                  votes









                  active

                  oldest

                  votes






                  active

                  oldest

                  votes








                  up vote
                  2
                  down vote













                  It is an "in-sample" forecast since you eventually make the forecast on observations that are already part of your training set. Why not use n-fold cross-validation? By doing that, at each time, you are making "out-out" sample forecast, in which test set and training set are separate.






                  share|cite|improve this answer








                  New contributor




                  Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





















                    up vote
                    2
                    down vote













                    It is an "in-sample" forecast since you eventually make the forecast on observations that are already part of your training set. Why not use n-fold cross-validation? By doing that, at each time, you are making "out-out" sample forecast, in which test set and training set are separate.






                    share|cite|improve this answer








                    New contributor




                    Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.



















                      up vote
                      2
                      down vote










                      up vote
                      2
                      down vote









                      It is an "in-sample" forecast since you eventually make the forecast on observations that are already part of your training set. Why not use n-fold cross-validation? By doing that, at each time, you are making "out-out" sample forecast, in which test set and training set are separate.






                      share|cite|improve this answer








                      New contributor




                      Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.









                      It is an "in-sample" forecast since you eventually make the forecast on observations that are already part of your training set. Why not use n-fold cross-validation? By doing that, at each time, you are making "out-out" sample forecast, in which test set and training set are separate.







                      share|cite|improve this answer








                      New contributor




                      Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.









                      share|cite|improve this answer



                      share|cite|improve this answer






                      New contributor




                      Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.









                      answered 3 hours ago









                      Ray Yang

                      564




                      564




                      New contributor




                      Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.





                      New contributor





                      Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      Ray Yang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






















                          up vote
                          2
                          down vote













                          Yes, you are overfitting. The test set should be used only for testing, not for parameter tuning. Searching for parameters on the test set will learn the rules that are present in the test set, and eventually overfit it.






                          share|cite|improve this answer
























                            up vote
                            2
                            down vote













                            Yes, you are overfitting. The test set should be used only for testing, not for parameter tuning. Searching for parameters on the test set will learn the rules that are present in the test set, and eventually overfit it.






                            share|cite|improve this answer






















                              up vote
                              2
                              down vote










                              up vote
                              2
                              down vote









                              Yes, you are overfitting. The test set should be used only for testing, not for parameter tuning. Searching for parameters on the test set will learn the rules that are present in the test set, and eventually overfit it.






                              share|cite|improve this answer












                              Yes, you are overfitting. The test set should be used only for testing, not for parameter tuning. Searching for parameters on the test set will learn the rules that are present in the test set, and eventually overfit it.







                              share|cite|improve this answer












                              share|cite|improve this answer



                              share|cite|improve this answer










                              answered 1 hour ago









                              user2974951

                              15810




                              15810




















                                  up vote
                                  2
                                  down vote













                                  The idea behind holdout and cross validation is to estimate the generalization performance of a learning algorithm--that is, the expected performance on future data drawn from the same distribution as the training data. This can be used to tune hyperparameters or report the final performance. The validity of this estimate depends on the independence of the data used for training and estimating performance. If this independence is violated, the performance estimate will be overoptimistically biased. The most egregious way this can happen is by estimating performance on data that has already been used for training or hyperpameter tuning, but there are many more subtle and insidious ways too.



                                  The procedure you asked about goes wrong in multiple ways. First, the same data is used for both training and hyperpameter tuning. The goal of hyperparameter tuning is to select hyperparameters that will give good generalization performance. Typically, this works by estimating the generalization performance for different choices of hyperparameters (e.g. using a validation set), and then choosing the best. But, as above, this estimate will be overoptimistic if the same data has been used for training. The consequence is that sub-optimal hyperparameters will be chosen. In particular, there will be a bias toward high capacity models that will overfit.



                                  Second, data that has already been used to tune hyperparameters is being re-used to estimate performance. This will give a deceptive estimate, as above. This isn't overfitting itself but it means that, if overfitting is happening (and it probably is, as above), then you won't know it.



                                  The remedy is to use three separate datasets: a training set for training, a validation set for hyperparameter tuning, and a test set for estimating the final performance. Or, use nested cross validation, which will give better estimates, and is necessary if there isn't enough data.






                                  share|cite|improve this answer
























                                    up vote
                                    2
                                    down vote













                                    The idea behind holdout and cross validation is to estimate the generalization performance of a learning algorithm--that is, the expected performance on future data drawn from the same distribution as the training data. This can be used to tune hyperparameters or report the final performance. The validity of this estimate depends on the independence of the data used for training and estimating performance. If this independence is violated, the performance estimate will be overoptimistically biased. The most egregious way this can happen is by estimating performance on data that has already been used for training or hyperpameter tuning, but there are many more subtle and insidious ways too.



                                    The procedure you asked about goes wrong in multiple ways. First, the same data is used for both training and hyperpameter tuning. The goal of hyperparameter tuning is to select hyperparameters that will give good generalization performance. Typically, this works by estimating the generalization performance for different choices of hyperparameters (e.g. using a validation set), and then choosing the best. But, as above, this estimate will be overoptimistic if the same data has been used for training. The consequence is that sub-optimal hyperparameters will be chosen. In particular, there will be a bias toward high capacity models that will overfit.



                                    Second, data that has already been used to tune hyperparameters is being re-used to estimate performance. This will give a deceptive estimate, as above. This isn't overfitting itself but it means that, if overfitting is happening (and it probably is, as above), then you won't know it.



                                    The remedy is to use three separate datasets: a training set for training, a validation set for hyperparameter tuning, and a test set for estimating the final performance. Or, use nested cross validation, which will give better estimates, and is necessary if there isn't enough data.






                                    share|cite|improve this answer






















                                      up vote
                                      2
                                      down vote










                                      up vote
                                      2
                                      down vote









                                      The idea behind holdout and cross validation is to estimate the generalization performance of a learning algorithm--that is, the expected performance on future data drawn from the same distribution as the training data. This can be used to tune hyperparameters or report the final performance. The validity of this estimate depends on the independence of the data used for training and estimating performance. If this independence is violated, the performance estimate will be overoptimistically biased. The most egregious way this can happen is by estimating performance on data that has already been used for training or hyperpameter tuning, but there are many more subtle and insidious ways too.



                                      The procedure you asked about goes wrong in multiple ways. First, the same data is used for both training and hyperpameter tuning. The goal of hyperparameter tuning is to select hyperparameters that will give good generalization performance. Typically, this works by estimating the generalization performance for different choices of hyperparameters (e.g. using a validation set), and then choosing the best. But, as above, this estimate will be overoptimistic if the same data has been used for training. The consequence is that sub-optimal hyperparameters will be chosen. In particular, there will be a bias toward high capacity models that will overfit.



                                      Second, data that has already been used to tune hyperparameters is being re-used to estimate performance. This will give a deceptive estimate, as above. This isn't overfitting itself but it means that, if overfitting is happening (and it probably is, as above), then you won't know it.



                                      The remedy is to use three separate datasets: a training set for training, a validation set for hyperparameter tuning, and a test set for estimating the final performance. Or, use nested cross validation, which will give better estimates, and is necessary if there isn't enough data.






                                      share|cite|improve this answer












                                      The idea behind holdout and cross validation is to estimate the generalization performance of a learning algorithm--that is, the expected performance on future data drawn from the same distribution as the training data. This can be used to tune hyperparameters or report the final performance. The validity of this estimate depends on the independence of the data used for training and estimating performance. If this independence is violated, the performance estimate will be overoptimistically biased. The most egregious way this can happen is by estimating performance on data that has already been used for training or hyperpameter tuning, but there are many more subtle and insidious ways too.



                                      The procedure you asked about goes wrong in multiple ways. First, the same data is used for both training and hyperpameter tuning. The goal of hyperparameter tuning is to select hyperparameters that will give good generalization performance. Typically, this works by estimating the generalization performance for different choices of hyperparameters (e.g. using a validation set), and then choosing the best. But, as above, this estimate will be overoptimistic if the same data has been used for training. The consequence is that sub-optimal hyperparameters will be chosen. In particular, there will be a bias toward high capacity models that will overfit.



                                      Second, data that has already been used to tune hyperparameters is being re-used to estimate performance. This will give a deceptive estimate, as above. This isn't overfitting itself but it means that, if overfitting is happening (and it probably is, as above), then you won't know it.



                                      The remedy is to use three separate datasets: a training set for training, a validation set for hyperparameter tuning, and a test set for estimating the final performance. Or, use nested cross validation, which will give better estimates, and is necessary if there isn't enough data.







                                      share|cite|improve this answer












                                      share|cite|improve this answer



                                      share|cite|improve this answer










                                      answered 1 hour ago









                                      user20160

                                      13.7k12250




                                      13.7k12250




















                                          up vote
                                          1
                                          down vote













                                          I would say you are not necessarily overfitting, because overfitting is a term that is normally used to indicate that your model does not generalise well. E.g. If you would be doing linear regression on something like MNIST images, you are probably still underfitting (it does not generalise enough) when training on both training and test data.



                                          What you are doing, however, is still not a good thing. The test set is normally a part of the data that you want to use to check how good the final, trained model will perform on data it has never seen before. If you use this data to choose hyperparameters, you actually give the model a chance to "see" the test data and to develop a bias towards this test data. Therefore, you actually lose the possibility to find out how good your model would actually be on unseen data (because it has already seen the test data).



                                          It might be possible that you do not really care about how well your model performs, but then you would not need a test set either. Because in most scenarios you do want to have an idea how good a model is, it is best to lock the test data away before you start doing anything with the data. Something as little as using test data during pre-processing, will probably lead to a biased model.



                                          Now you might be asking yourself: "How should I find hyperparameters then?". The easiest way would be to split the available data (assuming that you already safely put away some data for testing) into a training set and a so-called validation set. If you have little data to work with, it probably makes more sense to take a look at cross validation






                                          share|cite|improve this answer
























                                            up vote
                                            1
                                            down vote













                                            I would say you are not necessarily overfitting, because overfitting is a term that is normally used to indicate that your model does not generalise well. E.g. If you would be doing linear regression on something like MNIST images, you are probably still underfitting (it does not generalise enough) when training on both training and test data.



                                            What you are doing, however, is still not a good thing. The test set is normally a part of the data that you want to use to check how good the final, trained model will perform on data it has never seen before. If you use this data to choose hyperparameters, you actually give the model a chance to "see" the test data and to develop a bias towards this test data. Therefore, you actually lose the possibility to find out how good your model would actually be on unseen data (because it has already seen the test data).



                                            It might be possible that you do not really care about how well your model performs, but then you would not need a test set either. Because in most scenarios you do want to have an idea how good a model is, it is best to lock the test data away before you start doing anything with the data. Something as little as using test data during pre-processing, will probably lead to a biased model.



                                            Now you might be asking yourself: "How should I find hyperparameters then?". The easiest way would be to split the available data (assuming that you already safely put away some data for testing) into a training set and a so-called validation set. If you have little data to work with, it probably makes more sense to take a look at cross validation






                                            share|cite|improve this answer






















                                              up vote
                                              1
                                              down vote










                                              up vote
                                              1
                                              down vote









                                              I would say you are not necessarily overfitting, because overfitting is a term that is normally used to indicate that your model does not generalise well. E.g. If you would be doing linear regression on something like MNIST images, you are probably still underfitting (it does not generalise enough) when training on both training and test data.



                                              What you are doing, however, is still not a good thing. The test set is normally a part of the data that you want to use to check how good the final, trained model will perform on data it has never seen before. If you use this data to choose hyperparameters, you actually give the model a chance to "see" the test data and to develop a bias towards this test data. Therefore, you actually lose the possibility to find out how good your model would actually be on unseen data (because it has already seen the test data).



                                              It might be possible that you do not really care about how well your model performs, but then you would not need a test set either. Because in most scenarios you do want to have an idea how good a model is, it is best to lock the test data away before you start doing anything with the data. Something as little as using test data during pre-processing, will probably lead to a biased model.



                                              Now you might be asking yourself: "How should I find hyperparameters then?". The easiest way would be to split the available data (assuming that you already safely put away some data for testing) into a training set and a so-called validation set. If you have little data to work with, it probably makes more sense to take a look at cross validation






                                              share|cite|improve this answer












                                              I would say you are not necessarily overfitting, because overfitting is a term that is normally used to indicate that your model does not generalise well. E.g. If you would be doing linear regression on something like MNIST images, you are probably still underfitting (it does not generalise enough) when training on both training and test data.



                                              What you are doing, however, is still not a good thing. The test set is normally a part of the data that you want to use to check how good the final, trained model will perform on data it has never seen before. If you use this data to choose hyperparameters, you actually give the model a chance to "see" the test data and to develop a bias towards this test data. Therefore, you actually lose the possibility to find out how good your model would actually be on unseen data (because it has already seen the test data).



                                              It might be possible that you do not really care about how well your model performs, but then you would not need a test set either. Because in most scenarios you do want to have an idea how good a model is, it is best to lock the test data away before you start doing anything with the data. Something as little as using test data during pre-processing, will probably lead to a biased model.



                                              Now you might be asking yourself: "How should I find hyperparameters then?". The easiest way would be to split the available data (assuming that you already safely put away some data for testing) into a training set and a so-called validation set. If you have little data to work with, it probably makes more sense to take a look at cross validation







                                              share|cite|improve this answer












                                              share|cite|improve this answer



                                              share|cite|improve this answer










                                              answered 1 hour ago









                                              Mr Tsjolder

                                              470315




                                              470315



























                                                   

                                                  draft saved


                                                  draft discarded















































                                                   


                                                  draft saved


                                                  draft discarded














                                                  StackExchange.ready(
                                                  function ()
                                                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f366862%2fis-using-both-training-and-test-sets-for-hyperparameter-tuning-overfitting%23new-answer', 'question_page');

                                                  );

                                                  Post as a guest













































































                                                  Comments

                                                  Popular posts from this blog

                                                  What does second last employer means? [closed]

                                                  List of Gilmore Girls characters

                                                  Confectionery