Is it possible to combine predictions to improve overall prediction quality?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
This is a binary classification problem. The metric that is being minimised is the log loss ( or cross entropy ). I also have an accuracy number, just for my information. It is a large, very balanced data set. Very naive prediction techniques get about 50% accuracy and 0.693 log loss. The best I've been able to scrape out is 52.5% accuracy and 0.6915 log loss. Since we are trying to minimize the log loss, we always get a set of probabilities ( predict_proba
functions in sklearn and keras ). Thats all background, now the question.
Lets say I can use 2 different techniques to create 2 different sets of predictions that have comparable accuracy and log loss metrics. For example, I can use 2 different groups of the input features to produce 2 sets of predictions that are both about 52% accurate with < 0.692 log loss. The point is that both sets of predictions show there is some predictive power. Another example is that I could use logistic regression to produce one set of predictions and a neural net to produce the other.
Here are the first 10 for each set, for example:
p1 = [0.49121362 0.52067905 0.50230295 0.49511673 0.52009695 0.49394751 0.48676686 0.50084939 0.48693237 0.49564188 ...]
p2 = [0.4833959 0.49700296 0.50484381 0.49122147 0.52754993 0.51766402 0.48326918 0.50432501 0.48721228 0.48949306 ...]
I'm thinking that there should be a way to combine the 2 sets of predictions into one, to increase the overall predictive power. Is there?
I had started trying some things. For example I consider the absolute value of the prediction minus 0.5 ( abs( p - 0.5 )
) as a signal, and whichever between p1
and p2
had a greater signal, I would use that value. This slightly accomplished that I wanted, but just by a slim margin. And in another instance it didn't seem to help at all. Interestingly it didn't seem to destroy the predictive power.
machine-learning prediction boosting
add a comment |Â
up vote
1
down vote
favorite
This is a binary classification problem. The metric that is being minimised is the log loss ( or cross entropy ). I also have an accuracy number, just for my information. It is a large, very balanced data set. Very naive prediction techniques get about 50% accuracy and 0.693 log loss. The best I've been able to scrape out is 52.5% accuracy and 0.6915 log loss. Since we are trying to minimize the log loss, we always get a set of probabilities ( predict_proba
functions in sklearn and keras ). Thats all background, now the question.
Lets say I can use 2 different techniques to create 2 different sets of predictions that have comparable accuracy and log loss metrics. For example, I can use 2 different groups of the input features to produce 2 sets of predictions that are both about 52% accurate with < 0.692 log loss. The point is that both sets of predictions show there is some predictive power. Another example is that I could use logistic regression to produce one set of predictions and a neural net to produce the other.
Here are the first 10 for each set, for example:
p1 = [0.49121362 0.52067905 0.50230295 0.49511673 0.52009695 0.49394751 0.48676686 0.50084939 0.48693237 0.49564188 ...]
p2 = [0.4833959 0.49700296 0.50484381 0.49122147 0.52754993 0.51766402 0.48326918 0.50432501 0.48721228 0.48949306 ...]
I'm thinking that there should be a way to combine the 2 sets of predictions into one, to increase the overall predictive power. Is there?
I had started trying some things. For example I consider the absolute value of the prediction minus 0.5 ( abs( p - 0.5 )
) as a signal, and whichever between p1
and p2
had a greater signal, I would use that value. This slightly accomplished that I wanted, but just by a slim margin. And in another instance it didn't seem to help at all. Interestingly it didn't seem to destroy the predictive power.
machine-learning prediction boosting
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
This is a binary classification problem. The metric that is being minimised is the log loss ( or cross entropy ). I also have an accuracy number, just for my information. It is a large, very balanced data set. Very naive prediction techniques get about 50% accuracy and 0.693 log loss. The best I've been able to scrape out is 52.5% accuracy and 0.6915 log loss. Since we are trying to minimize the log loss, we always get a set of probabilities ( predict_proba
functions in sklearn and keras ). Thats all background, now the question.
Lets say I can use 2 different techniques to create 2 different sets of predictions that have comparable accuracy and log loss metrics. For example, I can use 2 different groups of the input features to produce 2 sets of predictions that are both about 52% accurate with < 0.692 log loss. The point is that both sets of predictions show there is some predictive power. Another example is that I could use logistic regression to produce one set of predictions and a neural net to produce the other.
Here are the first 10 for each set, for example:
p1 = [0.49121362 0.52067905 0.50230295 0.49511673 0.52009695 0.49394751 0.48676686 0.50084939 0.48693237 0.49564188 ...]
p2 = [0.4833959 0.49700296 0.50484381 0.49122147 0.52754993 0.51766402 0.48326918 0.50432501 0.48721228 0.48949306 ...]
I'm thinking that there should be a way to combine the 2 sets of predictions into one, to increase the overall predictive power. Is there?
I had started trying some things. For example I consider the absolute value of the prediction minus 0.5 ( abs( p - 0.5 )
) as a signal, and whichever between p1
and p2
had a greater signal, I would use that value. This slightly accomplished that I wanted, but just by a slim margin. And in another instance it didn't seem to help at all. Interestingly it didn't seem to destroy the predictive power.
machine-learning prediction boosting
This is a binary classification problem. The metric that is being minimised is the log loss ( or cross entropy ). I also have an accuracy number, just for my information. It is a large, very balanced data set. Very naive prediction techniques get about 50% accuracy and 0.693 log loss. The best I've been able to scrape out is 52.5% accuracy and 0.6915 log loss. Since we are trying to minimize the log loss, we always get a set of probabilities ( predict_proba
functions in sklearn and keras ). Thats all background, now the question.
Lets say I can use 2 different techniques to create 2 different sets of predictions that have comparable accuracy and log loss metrics. For example, I can use 2 different groups of the input features to produce 2 sets of predictions that are both about 52% accurate with < 0.692 log loss. The point is that both sets of predictions show there is some predictive power. Another example is that I could use logistic regression to produce one set of predictions and a neural net to produce the other.
Here are the first 10 for each set, for example:
p1 = [0.49121362 0.52067905 0.50230295 0.49511673 0.52009695 0.49394751 0.48676686 0.50084939 0.48693237 0.49564188 ...]
p2 = [0.4833959 0.49700296 0.50484381 0.49122147 0.52754993 0.51766402 0.48326918 0.50432501 0.48721228 0.48949306 ...]
I'm thinking that there should be a way to combine the 2 sets of predictions into one, to increase the overall predictive power. Is there?
I had started trying some things. For example I consider the absolute value of the prediction minus 0.5 ( abs( p - 0.5 )
) as a signal, and whichever between p1
and p2
had a greater signal, I would use that value. This slightly accomplished that I wanted, but just by a slim margin. And in another instance it didn't seem to help at all. Interestingly it didn't seem to destroy the predictive power.
machine-learning prediction boosting
machine-learning prediction boosting
asked 1 hour ago


jeffery_the_wind
1085
1085
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
Short answer: Yes.
Long answer: This is one of many examples of a technique known as "stacking". While you can, of course, decide on some manual way to combine both predictions, it is even better if you train a third model on the output of the first two models (or even more). This will further improve the accuracy. To avoid re-using the data, often a different part of the data set is used for training the first levels, and training the model that combines the data.
See e.g. here for an example.
This is exactly what I was talking about.
– jeffery_the_wind
46 mins ago
add a comment |Â
up vote
1
down vote
Yes.
The method you are talking about is called Stacking. It is a type of ensembling method. In this method, in the first stage multiple models are trained and the predictions are stored as features which will be used to train the second stage model. A lot of Kagglers use this method. Generally, you should use more than 2 models for the first stage while stacking (I generally use at least 4-5 models). There are also many methods in which stacking can be performed like simple averaging, majority voting etc. Here is a link to a kaggle kernel which implements stacking on the famous Titanic Dataset which is also a binary classification problem.
Kaggle Kernel Intro to Stacking using Titanic Dataset
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Short answer: Yes.
Long answer: This is one of many examples of a technique known as "stacking". While you can, of course, decide on some manual way to combine both predictions, it is even better if you train a third model on the output of the first two models (or even more). This will further improve the accuracy. To avoid re-using the data, often a different part of the data set is used for training the first levels, and training the model that combines the data.
See e.g. here for an example.
This is exactly what I was talking about.
– jeffery_the_wind
46 mins ago
add a comment |Â
up vote
1
down vote
accepted
Short answer: Yes.
Long answer: This is one of many examples of a technique known as "stacking". While you can, of course, decide on some manual way to combine both predictions, it is even better if you train a third model on the output of the first two models (or even more). This will further improve the accuracy. To avoid re-using the data, often a different part of the data set is used for training the first levels, and training the model that combines the data.
See e.g. here for an example.
This is exactly what I was talking about.
– jeffery_the_wind
46 mins ago
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Short answer: Yes.
Long answer: This is one of many examples of a technique known as "stacking". While you can, of course, decide on some manual way to combine both predictions, it is even better if you train a third model on the output of the first two models (or even more). This will further improve the accuracy. To avoid re-using the data, often a different part of the data set is used for training the first levels, and training the model that combines the data.
See e.g. here for an example.
Short answer: Yes.
Long answer: This is one of many examples of a technique known as "stacking". While you can, of course, decide on some manual way to combine both predictions, it is even better if you train a third model on the output of the first two models (or even more). This will further improve the accuracy. To avoid re-using the data, often a different part of the data set is used for training the first levels, and training the model that combines the data.
See e.g. here for an example.
answered 56 mins ago
LiKao
6361614
6361614
This is exactly what I was talking about.
– jeffery_the_wind
46 mins ago
add a comment |Â
This is exactly what I was talking about.
– jeffery_the_wind
46 mins ago
This is exactly what I was talking about.
– jeffery_the_wind
46 mins ago
This is exactly what I was talking about.
– jeffery_the_wind
46 mins ago
add a comment |Â
up vote
1
down vote
Yes.
The method you are talking about is called Stacking. It is a type of ensembling method. In this method, in the first stage multiple models are trained and the predictions are stored as features which will be used to train the second stage model. A lot of Kagglers use this method. Generally, you should use more than 2 models for the first stage while stacking (I generally use at least 4-5 models). There are also many methods in which stacking can be performed like simple averaging, majority voting etc. Here is a link to a kaggle kernel which implements stacking on the famous Titanic Dataset which is also a binary classification problem.
Kaggle Kernel Intro to Stacking using Titanic Dataset
add a comment |Â
up vote
1
down vote
Yes.
The method you are talking about is called Stacking. It is a type of ensembling method. In this method, in the first stage multiple models are trained and the predictions are stored as features which will be used to train the second stage model. A lot of Kagglers use this method. Generally, you should use more than 2 models for the first stage while stacking (I generally use at least 4-5 models). There are also many methods in which stacking can be performed like simple averaging, majority voting etc. Here is a link to a kaggle kernel which implements stacking on the famous Titanic Dataset which is also a binary classification problem.
Kaggle Kernel Intro to Stacking using Titanic Dataset
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Yes.
The method you are talking about is called Stacking. It is a type of ensembling method. In this method, in the first stage multiple models are trained and the predictions are stored as features which will be used to train the second stage model. A lot of Kagglers use this method. Generally, you should use more than 2 models for the first stage while stacking (I generally use at least 4-5 models). There are also many methods in which stacking can be performed like simple averaging, majority voting etc. Here is a link to a kaggle kernel which implements stacking on the famous Titanic Dataset which is also a binary classification problem.
Kaggle Kernel Intro to Stacking using Titanic Dataset
Yes.
The method you are talking about is called Stacking. It is a type of ensembling method. In this method, in the first stage multiple models are trained and the predictions are stored as features which will be used to train the second stage model. A lot of Kagglers use this method. Generally, you should use more than 2 models for the first stage while stacking (I generally use at least 4-5 models). There are also many methods in which stacking can be performed like simple averaging, majority voting etc. Here is a link to a kaggle kernel which implements stacking on the famous Titanic Dataset which is also a binary classification problem.
Kaggle Kernel Intro to Stacking using Titanic Dataset
answered 33 mins ago
frank
615
615
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f370271%2fis-it-possible-to-combine-predictions-to-improve-overall-prediction-quality%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password