Can overfitting be a good thing in some cases?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
2
down vote

favorite

I know the goal of machine learning is to create generalizable models and therefore overfitting is undesirable.

However, I wonder if it could be desirable in some cases. For example, let's say I want to predict if a student will dropout a course, and I want to do this before the end of the course by using a proxy label, their assignment submission status. In this scenario, I would not mind if the model trained on the proxy label overfits and does not generalize well to the unseen data, since I care about only a specific set of users.

I wonder if this is a valid way of thinking for this specific scenario. Any ideas?

asked 2 hours ago

renakre

145116

You mean you only want to have a kind of efficient way of memorizing the data (in that case overfitting is indeed good)? Or do you actually want to predict new data from the same users (only a certain type of overfitting is good - this may e.g. change your cross-validation strategy)?
â€“Â BjÃ¶rn
30 mins ago

add a commentÂ |Â

up vote
2
down vote

favorite

I know the goal of machine learning is to create generalizable models and therefore overfitting is undesirable.

I wonder if this is a valid way of thinking for this specific scenario. Any ideas?

asked 2 hours ago

renakre

145116

You mean you only want to have a kind of efficient way of memorizing the data (in that case overfitting is indeed good)? Or do you actually want to predict new data from the same users (only a certain type of overfitting is good - this may e.g. change your cross-validation strategy)?
â€“Â BjÃ¶rn
30 mins ago

add a commentÂ |Â

up vote
2
down vote

favorite

I know the goal of machine learning is to create generalizable models and therefore overfitting is undesirable.

I wonder if this is a valid way of thinking for this specific scenario. Any ideas?

asked 2 hours ago

renakre

145116

I know the goal of machine learning is to create generalizable models and therefore overfitting is undesirable.

I wonder if this is a valid way of thinking for this specific scenario. Any ideas?

machine-learning overfitting train

asked 2 hours ago

renakre

145116

asked 2 hours ago

renakre

145116

asked 2 hours ago

renakre

145116

asked 2 hours ago

renakre

145116

asked 2 hours ago

renakre

145116

You mean you only want to have a kind of efficient way of memorizing the data (in that case overfitting is indeed good)? Or do you actually want to predict new data from the same users (only a certain type of overfitting is good - this may e.g. change your cross-validation strategy)?
â€“Â BjÃ¶rn
30 mins ago

add a commentÂ |Â

You mean you only want to have a kind of efficient way of memorizing the data (in that case overfitting is indeed good)? Or do you actually want to predict new data from the same users (only a certain type of overfitting is good - this may e.g. change your cross-validation strategy)?
â€“Â BjÃ¶rn
30 mins ago

You mean you only want to have a kind of efficient way of memorizing the data (in that case overfitting is indeed good)? Or do you actually want to predict new data from the same users (only a certain type of overfitting is good - this may e.g. change your cross-validation strategy)?
â€“Â BjÃ¶rn
30 mins ago

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
2
down vote

Does your trained model performs effective on both training and test data? when it's so then its not overfitting. If your model performs well in training data but worse in test data, then it is overfitting which wouldn't make sense for any application. If you mean that your training data and test data are same, then yeah overfitting wouldn't be an issue there. But if you want to work on unseen test data, then overfitting is always a problem.

edited 54 mins ago

Ferdi

3,55142151

answered 2 hours ago

Sanga

211

New contributor

add a commentÂ |Â

up vote
1
down vote

Actually the labels "generalization" and "overfitting" might be a bit misleading here.

What you want in your example is a good prediction of the dropout status.

So technically:
In training you therefore need to have an unbiased sample of dropout and non-dropout-students. It is extremely important to prepare not only the model, but even more the data you are using to evaluate (train, validate, etc.).

There are text-book examples of overfitting, where you can e.g. plot a performance indicator (e.g. mislabelling rate) of the your training data and compare it with validation data. The training performance will always become better, but at some point the validation will become worse. There it is probably very clear that you'd rather stop the learning process before it worsens the performance.

What is meant by "generalization" is actually very specific. You want your trained model to be the best possible once it encounters previously unseen data. You use the validation data, because there you know "the truth". Unlike with real data.

So still as above: you want your model to give a prediction of the students status.
- if your model is overfitted, it will give higher valued indicators for the students in your training set; but will perform worse on non-trained data
- if you model has good generalization power; it will perform equally well on training data, as well as non-training data

If you talk about "specific data sets", then either these are the basis of your training and validation; or you simply do it wrong. And this has nothing to do with generalization or overfitting in machine learning.

answered 16 mins ago

cherub

1,260210

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f372696%2fcan-overfitting-be-a-good-thing-in-some-cases%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

edited 54 mins ago

Ferdi

3,55142151

answered 2 hours ago

Sanga

211

New contributor

add a commentÂ |Â

up vote
2
down vote

edited 54 mins ago

Ferdi

3,55142151

answered 2 hours ago

Sanga

211

New contributor

add a commentÂ |Â

up vote
2
down vote

edited 54 mins ago

Ferdi

3,55142151

answered 2 hours ago

Sanga

211

New contributor

edited 54 mins ago

Ferdi

3,55142151

answered 2 hours ago

Sanga

211

New contributor

edited 54 mins ago

Ferdi

3,55142151

edited 54 mins ago

Ferdi

3,55142151

edited 54 mins ago

Ferdi

3,55142151

answered 2 hours ago

Sanga

211

New contributor

answered 2 hours ago

Sanga

211

answered 2 hours ago

Sanga

211

New contributor

Sanga is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a commentÂ |Â

up vote
1
down vote

Actually the labels "generalization" and "overfitting" might be a bit misleading here.

What you want in your example is a good prediction of the dropout status.

answered 16 mins ago

cherub

1,260210

add a commentÂ |Â

up vote
1
down vote

Actually the labels "generalization" and "overfitting" might be a bit misleading here.

What you want in your example is a good prediction of the dropout status.

answered 16 mins ago

cherub

1,260210

add a commentÂ |Â

up vote
1
down vote

Actually the labels "generalization" and "overfitting" might be a bit misleading here.

What you want in your example is a good prediction of the dropout status.

answered 16 mins ago

cherub

1,260210

Actually the labels "generalization" and "overfitting" might be a bit misleading here.

What you want in your example is a good prediction of the dropout status.

answered 16 mins ago

cherub

1,260210

answered 16 mins ago

cherub

1,260210

answered 16 mins ago

cherub

1,260210

answered 16 mins ago

cherub

1,260210

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky