Is it possible to combine predictions to improve overall prediction quality?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
1
down vote

favorite

This is a binary classification problem. The metric that is being minimised is the log loss ( or cross entropy ). I also have an accuracy number, just for my information. It is a large, very balanced data set. Very naive prediction techniques get about 50% accuracy and 0.693 log loss. The best I've been able to scrape out is 52.5% accuracy and 0.6915 log loss. Since we are trying to minimize the log loss, we always get a set of probabilities ( predict_proba functions in sklearn and keras ). Thats all background, now the question.

Lets say I can use 2 different techniques to create 2 different sets of predictions that have comparable accuracy and log loss metrics. For example, I can use 2 different groups of the input features to produce 2 sets of predictions that are both about 52% accurate with < 0.692 log loss. The point is that both sets of predictions show there is some predictive power. Another example is that I could use logistic regression to produce one set of predictions and a neural net to produce the other.

Here are the first 10 for each set, for example:

p1 = [0.49121362 0.52067905 0.50230295 0.49511673 0.52009695 0.49394751 0.48676686 0.50084939 0.48693237 0.49564188 ...]
p2 = [0.4833959 0.49700296 0.50484381 0.49122147 0.52754993 0.51766402 0.48326918 0.50432501 0.48721228 0.48949306 ...]

I'm thinking that there should be a way to combine the 2 sets of predictions into one, to increase the overall predictive power. Is there?

I had started trying some things. For example I consider the absolute value of the prediction minus 0.5 ( abs( p - 0.5 ) ) as a signal, and whichever between p1 and p2 had a greater signal, I would use that value. This slightly accomplished that I wanted, but just by a slim margin. And in another instance it didn't seem to help at all. Interestingly it didn't seem to destroy the predictive power.

asked 1 hour ago

jeffery_the_wind

1085

add a commentÂ |Â

up vote
1
down vote

favorite

Here are the first 10 for each set, for example:

p1 = [0.49121362 0.52067905 0.50230295 0.49511673 0.52009695 0.49394751 0.48676686 0.50084939 0.48693237 0.49564188 ...]
p2 = [0.4833959 0.49700296 0.50484381 0.49122147 0.52754993 0.51766402 0.48326918 0.50432501 0.48721228 0.48949306 ...]

I'm thinking that there should be a way to combine the 2 sets of predictions into one, to increase the overall predictive power. Is there?

asked 1 hour ago

jeffery_the_wind

1085

add a commentÂ |Â

up vote
1
down vote

favorite

Here are the first 10 for each set, for example:

p1 = [0.49121362 0.52067905 0.50230295 0.49511673 0.52009695 0.49394751 0.48676686 0.50084939 0.48693237 0.49564188 ...]
p2 = [0.4833959 0.49700296 0.50484381 0.49122147 0.52754993 0.51766402 0.48326918 0.50432501 0.48721228 0.48949306 ...]

I'm thinking that there should be a way to combine the 2 sets of predictions into one, to increase the overall predictive power. Is there?

asked 1 hour ago

jeffery_the_wind

1085

Here are the first 10 for each set, for example:

p1 = [0.49121362 0.52067905 0.50230295 0.49511673 0.52009695 0.49394751 0.48676686 0.50084939 0.48693237 0.49564188 ...]
p2 = [0.4833959 0.49700296 0.50484381 0.49122147 0.52754993 0.51766402 0.48326918 0.50432501 0.48721228 0.48949306 ...]

I'm thinking that there should be a way to combine the 2 sets of predictions into one, to increase the overall predictive power. Is there?

machine-learning prediction boosting

asked 1 hour ago

jeffery_the_wind

1085

asked 1 hour ago

jeffery_the_wind

1085

asked 1 hour ago

jeffery_the_wind

1085

asked 1 hour ago

jeffery_the_wind

1085

asked 1 hour ago

jeffery_the_wind

1085

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

Short answer: Yes.

Long answer: This is one of many examples of a technique known as "stacking". While you can, of course, decide on some manual way to combine both predictions, it is even better if you train a third model on the output of the first two models (or even more). This will further improve the accuracy. To avoid re-using the data, often a different part of the data set is used for training the first levels, and training the model that combines the data.

See e.g. here for an example.

answered 56 mins ago

LiKao

6361614

This is exactly what I was talking about.
â€“Â jeffery_the_wind
46 mins ago

add a commentÂ |Â

up vote
1
down vote

Yes.

The method you are talking about is called Stacking. It is a type of ensembling method. In this method, in the first stage multiple models are trained and the predictions are stored as features which will be used to train the second stage model. A lot of Kagglers use this method. Generally, you should use more than 2 models for the first stage while stacking (I generally use at least 4-5 models). There are also many methods in which stacking can be performed like simple averaging, majority voting etc. Here is a link to a kaggle kernel which implements stacking on the famous Titanic Dataset which is also a binary classification problem.
Kaggle Kernel Intro to Stacking using Titanic Dataset

answered 33 mins ago

frank

615

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f370271%2fis-it-possible-to-combine-predictions-to-improve-overall-prediction-quality%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

Short answer: Yes.

See e.g. here for an example.

answered 56 mins ago

LiKao

6361614

This is exactly what I was talking about.
â€“Â jeffery_the_wind
46 mins ago

add a commentÂ |Â

up vote
1
down vote

accepted

Short answer: Yes.

See e.g. here for an example.

answered 56 mins ago

LiKao

6361614

This is exactly what I was talking about.
â€“Â jeffery_the_wind
46 mins ago

add a commentÂ |Â

up vote
1
down vote

accepted

Short answer: Yes.

See e.g. here for an example.

answered 56 mins ago

LiKao

6361614

Short answer: Yes.

See e.g. here for an example.

answered 56 mins ago

LiKao

6361614

answered 56 mins ago

LiKao

6361614

answered 56 mins ago

LiKao

6361614

answered 56 mins ago

LiKao

6361614

This is exactly what I was talking about.
â€“Â jeffery_the_wind
46 mins ago

add a commentÂ |Â

This is exactly what I was talking about.
â€“Â jeffery_the_wind
46 mins ago

This is exactly what I was talking about.
â€“Â jeffery_the_wind
46 mins ago

add a commentÂ |Â

up vote
1
down vote

answered 33 mins ago

frank

615

add a commentÂ |Â

up vote
1
down vote

answered 33 mins ago

frank

615

add a commentÂ |Â

up vote
1
down vote

answered 33 mins ago

frank

615

answered 33 mins ago

frank

615

answered 33 mins ago

frank

615

answered 33 mins ago

frank

615

answered 33 mins ago

frank

615

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky