Is it possible to combine predictions to improve overall prediction quality?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite












This is a binary classification problem. The metric that is being minimised is the log loss ( or cross entropy ). I also have an accuracy number, just for my information. It is a large, very balanced data set. Very naive prediction techniques get about 50% accuracy and 0.693 log loss. The best I've been able to scrape out is 52.5% accuracy and 0.6915 log loss. Since we are trying to minimize the log loss, we always get a set of probabilities ( predict_proba functions in sklearn and keras ). Thats all background, now the question.



Lets say I can use 2 different techniques to create 2 different sets of predictions that have comparable accuracy and log loss metrics. For example, I can use 2 different groups of the input features to produce 2 sets of predictions that are both about 52% accurate with < 0.692 log loss. The point is that both sets of predictions show there is some predictive power. Another example is that I could use logistic regression to produce one set of predictions and a neural net to produce the other.



Here are the first 10 for each set, for example:



p1 = [0.49121362 0.52067905 0.50230295 0.49511673 0.52009695 0.49394751 0.48676686 0.50084939 0.48693237 0.49564188 ...]
p2 = [0.4833959 0.49700296 0.50484381 0.49122147 0.52754993 0.51766402 0.48326918 0.50432501 0.48721228 0.48949306 ...]


I'm thinking that there should be a way to combine the 2 sets of predictions into one, to increase the overall predictive power. Is there?



I had started trying some things. For example I consider the absolute value of the prediction minus 0.5 ( abs( p - 0.5 ) ) as a signal, and whichever between p1 and p2 had a greater signal, I would use that value. This slightly accomplished that I wanted, but just by a slim margin. And in another instance it didn't seem to help at all. Interestingly it didn't seem to destroy the predictive power.










share|cite|improve this question



























    up vote
    1
    down vote

    favorite












    This is a binary classification problem. The metric that is being minimised is the log loss ( or cross entropy ). I also have an accuracy number, just for my information. It is a large, very balanced data set. Very naive prediction techniques get about 50% accuracy and 0.693 log loss. The best I've been able to scrape out is 52.5% accuracy and 0.6915 log loss. Since we are trying to minimize the log loss, we always get a set of probabilities ( predict_proba functions in sklearn and keras ). Thats all background, now the question.



    Lets say I can use 2 different techniques to create 2 different sets of predictions that have comparable accuracy and log loss metrics. For example, I can use 2 different groups of the input features to produce 2 sets of predictions that are both about 52% accurate with < 0.692 log loss. The point is that both sets of predictions show there is some predictive power. Another example is that I could use logistic regression to produce one set of predictions and a neural net to produce the other.



    Here are the first 10 for each set, for example:



    p1 = [0.49121362 0.52067905 0.50230295 0.49511673 0.52009695 0.49394751 0.48676686 0.50084939 0.48693237 0.49564188 ...]
    p2 = [0.4833959 0.49700296 0.50484381 0.49122147 0.52754993 0.51766402 0.48326918 0.50432501 0.48721228 0.48949306 ...]


    I'm thinking that there should be a way to combine the 2 sets of predictions into one, to increase the overall predictive power. Is there?



    I had started trying some things. For example I consider the absolute value of the prediction minus 0.5 ( abs( p - 0.5 ) ) as a signal, and whichever between p1 and p2 had a greater signal, I would use that value. This slightly accomplished that I wanted, but just by a slim margin. And in another instance it didn't seem to help at all. Interestingly it didn't seem to destroy the predictive power.










    share|cite|improve this question























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      This is a binary classification problem. The metric that is being minimised is the log loss ( or cross entropy ). I also have an accuracy number, just for my information. It is a large, very balanced data set. Very naive prediction techniques get about 50% accuracy and 0.693 log loss. The best I've been able to scrape out is 52.5% accuracy and 0.6915 log loss. Since we are trying to minimize the log loss, we always get a set of probabilities ( predict_proba functions in sklearn and keras ). Thats all background, now the question.



      Lets say I can use 2 different techniques to create 2 different sets of predictions that have comparable accuracy and log loss metrics. For example, I can use 2 different groups of the input features to produce 2 sets of predictions that are both about 52% accurate with < 0.692 log loss. The point is that both sets of predictions show there is some predictive power. Another example is that I could use logistic regression to produce one set of predictions and a neural net to produce the other.



      Here are the first 10 for each set, for example:



      p1 = [0.49121362 0.52067905 0.50230295 0.49511673 0.52009695 0.49394751 0.48676686 0.50084939 0.48693237 0.49564188 ...]
      p2 = [0.4833959 0.49700296 0.50484381 0.49122147 0.52754993 0.51766402 0.48326918 0.50432501 0.48721228 0.48949306 ...]


      I'm thinking that there should be a way to combine the 2 sets of predictions into one, to increase the overall predictive power. Is there?



      I had started trying some things. For example I consider the absolute value of the prediction minus 0.5 ( abs( p - 0.5 ) ) as a signal, and whichever between p1 and p2 had a greater signal, I would use that value. This slightly accomplished that I wanted, but just by a slim margin. And in another instance it didn't seem to help at all. Interestingly it didn't seem to destroy the predictive power.










      share|cite|improve this question













      This is a binary classification problem. The metric that is being minimised is the log loss ( or cross entropy ). I also have an accuracy number, just for my information. It is a large, very balanced data set. Very naive prediction techniques get about 50% accuracy and 0.693 log loss. The best I've been able to scrape out is 52.5% accuracy and 0.6915 log loss. Since we are trying to minimize the log loss, we always get a set of probabilities ( predict_proba functions in sklearn and keras ). Thats all background, now the question.



      Lets say I can use 2 different techniques to create 2 different sets of predictions that have comparable accuracy and log loss metrics. For example, I can use 2 different groups of the input features to produce 2 sets of predictions that are both about 52% accurate with < 0.692 log loss. The point is that both sets of predictions show there is some predictive power. Another example is that I could use logistic regression to produce one set of predictions and a neural net to produce the other.



      Here are the first 10 for each set, for example:



      p1 = [0.49121362 0.52067905 0.50230295 0.49511673 0.52009695 0.49394751 0.48676686 0.50084939 0.48693237 0.49564188 ...]
      p2 = [0.4833959 0.49700296 0.50484381 0.49122147 0.52754993 0.51766402 0.48326918 0.50432501 0.48721228 0.48949306 ...]


      I'm thinking that there should be a way to combine the 2 sets of predictions into one, to increase the overall predictive power. Is there?



      I had started trying some things. For example I consider the absolute value of the prediction minus 0.5 ( abs( p - 0.5 ) ) as a signal, and whichever between p1 and p2 had a greater signal, I would use that value. This slightly accomplished that I wanted, but just by a slim margin. And in another instance it didn't seem to help at all. Interestingly it didn't seem to destroy the predictive power.







      machine-learning prediction boosting






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked 1 hour ago









      jeffery_the_wind

      1085




      1085




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          Short answer: Yes.



          Long answer: This is one of many examples of a technique known as "stacking". While you can, of course, decide on some manual way to combine both predictions, it is even better if you train a third model on the output of the first two models (or even more). This will further improve the accuracy. To avoid re-using the data, often a different part of the data set is used for training the first levels, and training the model that combines the data.



          See e.g. here for an example.






          share|cite|improve this answer




















          • This is exactly what I was talking about.
            – jeffery_the_wind
            46 mins ago

















          up vote
          1
          down vote













          Yes.

          The method you are talking about is called Stacking. It is a type of ensembling method. In this method, in the first stage multiple models are trained and the predictions are stored as features which will be used to train the second stage model. A lot of Kagglers use this method. Generally, you should use more than 2 models for the first stage while stacking (I generally use at least 4-5 models). There are also many methods in which stacking can be performed like simple averaging, majority voting etc. Here is a link to a kaggle kernel which implements stacking on the famous Titanic Dataset which is also a binary classification problem.
          Kaggle Kernel Intro to Stacking using Titanic Dataset






          share|cite|improve this answer




















            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "65"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f370271%2fis-it-possible-to-combine-predictions-to-improve-overall-prediction-quality%23new-answer', 'question_page');

            );

            Post as a guest






























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            Short answer: Yes.



            Long answer: This is one of many examples of a technique known as "stacking". While you can, of course, decide on some manual way to combine both predictions, it is even better if you train a third model on the output of the first two models (or even more). This will further improve the accuracy. To avoid re-using the data, often a different part of the data set is used for training the first levels, and training the model that combines the data.



            See e.g. here for an example.






            share|cite|improve this answer




















            • This is exactly what I was talking about.
              – jeffery_the_wind
              46 mins ago














            up vote
            1
            down vote



            accepted










            Short answer: Yes.



            Long answer: This is one of many examples of a technique known as "stacking". While you can, of course, decide on some manual way to combine both predictions, it is even better if you train a third model on the output of the first two models (or even more). This will further improve the accuracy. To avoid re-using the data, often a different part of the data set is used for training the first levels, and training the model that combines the data.



            See e.g. here for an example.






            share|cite|improve this answer




















            • This is exactly what I was talking about.
              – jeffery_the_wind
              46 mins ago












            up vote
            1
            down vote



            accepted







            up vote
            1
            down vote



            accepted






            Short answer: Yes.



            Long answer: This is one of many examples of a technique known as "stacking". While you can, of course, decide on some manual way to combine both predictions, it is even better if you train a third model on the output of the first two models (or even more). This will further improve the accuracy. To avoid re-using the data, often a different part of the data set is used for training the first levels, and training the model that combines the data.



            See e.g. here for an example.






            share|cite|improve this answer












            Short answer: Yes.



            Long answer: This is one of many examples of a technique known as "stacking". While you can, of course, decide on some manual way to combine both predictions, it is even better if you train a third model on the output of the first two models (or even more). This will further improve the accuracy. To avoid re-using the data, often a different part of the data set is used for training the first levels, and training the model that combines the data.



            See e.g. here for an example.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered 56 mins ago









            LiKao

            6361614




            6361614











            • This is exactly what I was talking about.
              – jeffery_the_wind
              46 mins ago
















            • This is exactly what I was talking about.
              – jeffery_the_wind
              46 mins ago















            This is exactly what I was talking about.
            – jeffery_the_wind
            46 mins ago




            This is exactly what I was talking about.
            – jeffery_the_wind
            46 mins ago












            up vote
            1
            down vote













            Yes.

            The method you are talking about is called Stacking. It is a type of ensembling method. In this method, in the first stage multiple models are trained and the predictions are stored as features which will be used to train the second stage model. A lot of Kagglers use this method. Generally, you should use more than 2 models for the first stage while stacking (I generally use at least 4-5 models). There are also many methods in which stacking can be performed like simple averaging, majority voting etc. Here is a link to a kaggle kernel which implements stacking on the famous Titanic Dataset which is also a binary classification problem.
            Kaggle Kernel Intro to Stacking using Titanic Dataset






            share|cite|improve this answer
























              up vote
              1
              down vote













              Yes.

              The method you are talking about is called Stacking. It is a type of ensembling method. In this method, in the first stage multiple models are trained and the predictions are stored as features which will be used to train the second stage model. A lot of Kagglers use this method. Generally, you should use more than 2 models for the first stage while stacking (I generally use at least 4-5 models). There are also many methods in which stacking can be performed like simple averaging, majority voting etc. Here is a link to a kaggle kernel which implements stacking on the famous Titanic Dataset which is also a binary classification problem.
              Kaggle Kernel Intro to Stacking using Titanic Dataset






              share|cite|improve this answer






















                up vote
                1
                down vote










                up vote
                1
                down vote









                Yes.

                The method you are talking about is called Stacking. It is a type of ensembling method. In this method, in the first stage multiple models are trained and the predictions are stored as features which will be used to train the second stage model. A lot of Kagglers use this method. Generally, you should use more than 2 models for the first stage while stacking (I generally use at least 4-5 models). There are also many methods in which stacking can be performed like simple averaging, majority voting etc. Here is a link to a kaggle kernel which implements stacking on the famous Titanic Dataset which is also a binary classification problem.
                Kaggle Kernel Intro to Stacking using Titanic Dataset






                share|cite|improve this answer












                Yes.

                The method you are talking about is called Stacking. It is a type of ensembling method. In this method, in the first stage multiple models are trained and the predictions are stored as features which will be used to train the second stage model. A lot of Kagglers use this method. Generally, you should use more than 2 models for the first stage while stacking (I generally use at least 4-5 models). There are also many methods in which stacking can be performed like simple averaging, majority voting etc. Here is a link to a kaggle kernel which implements stacking on the famous Titanic Dataset which is also a binary classification problem.
                Kaggle Kernel Intro to Stacking using Titanic Dataset







                share|cite|improve this answer












                share|cite|improve this answer



                share|cite|improve this answer










                answered 33 mins ago









                frank

                615




                615



























                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f370271%2fis-it-possible-to-combine-predictions-to-improve-overall-prediction-quality%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Comments

                    Popular posts from this blog

                    What does second last employer means? [closed]

                    List of Gilmore Girls characters

                    Confectionery