Binary predictor with highly skewed distribution

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
2
down vote

favorite

I am running a linear regression model and I have a binary predictor that has a highly skewed distribution. For example, one category represents 96% of the data. In terms of frequency, the other 4% represents 26 observations.

Should I keep/remove this binary predictor variable? And, what is the rationale for doing so? Thank you in advance!

edited 2 hours ago

asked 2 hours ago

curiousmind

10618

add a commentÂ |Â

up vote
2
down vote

favorite

Should I keep/remove this binary predictor variable? And, what is the rationale for doing so? Thank you in advance!

edited 2 hours ago

asked 2 hours ago

curiousmind

10618

add a commentÂ |Â

up vote
2
down vote

favorite

Should I keep/remove this binary predictor variable? And, what is the rationale for doing so? Thank you in advance!

edited 2 hours ago

asked 2 hours ago

curiousmind

10618

Should I keep/remove this binary predictor variable? And, what is the rationale for doing so? Thank you in advance!

regression binary-data skewness predictor

edited 2 hours ago

asked 2 hours ago

curiousmind

10618

edited 2 hours ago

asked 2 hours ago

curiousmind

10618

edited 2 hours ago

asked 2 hours ago

curiousmind

10618

asked 2 hours ago

curiousmind

10618

asked 2 hours ago

curiousmind

10618

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
3
down vote

In general, it's not an issue; you should keep it if it makes sense to be in the model, which presumably it does or it wouldn't be there to begin with.

Consider, for example, a model for weekly sales of chayote squash in the New Orleans area (see https://en.wikipedia.org/wiki/Chayote, down in the "Americas" section.) Such a model would likely need a dummy variable for Thanksgiving week in order to capture the very large increase in chayote sales at Thanksgiving (> 5x "regular" sales.) This dummy variable would take on the value "1" once every 52 weeks and "0" the rest of the time, so the "not Thanksgiving week" category represents roughly 98% of the data. If we take the dummy variable out, our Thanksgiving forecasts will be terrible and likely all the rest of our forecasts will be a lot worse, because they would be affected by the Thanksgiving data point in various ways (e.g., trends look much steeper if Thanksgiving is near the end of the modeling horizon, ...).

It's important, however, to note the following caveat. @Henry's comment in response to the OP is of course correct; if you only have one observation for one of the two categories, including the dummy variable will, in effect, simply remove that observation from the data set, and all your (other) parameter estimates would be the same as if you had just deleted that observation.

answered 2 hours ago

jbowman

22.7k24178

1

Thanks for your answer. I have made some edits to my question, do your response still holds? It seems it does, just wanted to confirm with you.
â€“Â curiousmind
2 hours ago

2

Yes, it does. I'll leave the caveat in there so that the answer is more widely applicable than just to the case where you have several observations in the "rare" category.
â€“Â jbowman
2 hours ago

1

Thank you. This answer is helpful.
â€“Â curiousmind
1 hour ago

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f370017%2fbinary-predictor-with-highly-skewed-distribution%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

In general, it's not an issue; you should keep it if it makes sense to be in the model, which presumably it does or it wouldn't be there to begin with.

answered 2 hours ago

jbowman

22.7k24178

1

Thanks for your answer. I have made some edits to my question, do your response still holds? It seems it does, just wanted to confirm with you.
â€“Â curiousmind
2 hours ago

2

Yes, it does. I'll leave the caveat in there so that the answer is more widely applicable than just to the case where you have several observations in the "rare" category.
â€“Â jbowman
2 hours ago

1

Thank you. This answer is helpful.
â€“Â curiousmind
1 hour ago

add a commentÂ |Â

up vote
3
down vote

In general, it's not an issue; you should keep it if it makes sense to be in the model, which presumably it does or it wouldn't be there to begin with.

answered 2 hours ago

jbowman

22.7k24178

1

Thanks for your answer. I have made some edits to my question, do your response still holds? It seems it does, just wanted to confirm with you.
â€“Â curiousmind
2 hours ago

2

Yes, it does. I'll leave the caveat in there so that the answer is more widely applicable than just to the case where you have several observations in the "rare" category.
â€“Â jbowman
2 hours ago

1

Thank you. This answer is helpful.
â€“Â curiousmind
1 hour ago

add a commentÂ |Â

up vote
3
down vote

In general, it's not an issue; you should keep it if it makes sense to be in the model, which presumably it does or it wouldn't be there to begin with.

answered 2 hours ago

jbowman

22.7k24178

In general, it's not an issue; you should keep it if it makes sense to be in the model, which presumably it does or it wouldn't be there to begin with.

answered 2 hours ago

jbowman

22.7k24178

answered 2 hours ago

jbowman

22.7k24178

answered 2 hours ago

jbowman

22.7k24178

answered 2 hours ago

jbowman

22.7k24178

1

Thanks for your answer. I have made some edits to my question, do your response still holds? It seems it does, just wanted to confirm with you.
â€“Â curiousmind
2 hours ago

2

Yes, it does. I'll leave the caveat in there so that the answer is more widely applicable than just to the case where you have several observations in the "rare" category.
â€“Â jbowman
2 hours ago

1

Thank you. This answer is helpful.
â€“Â curiousmind
1 hour ago

add a commentÂ |Â

1

Thanks for your answer. I have made some edits to my question, do your response still holds? It seems it does, just wanted to confirm with you.
â€“Â curiousmind
2 hours ago

2

Yes, it does. I'll leave the caveat in there so that the answer is more widely applicable than just to the case where you have several observations in the "rare" category.
â€“Â jbowman
2 hours ago

1

Thank you. This answer is helpful.
â€“Â curiousmind
1 hour ago

Thanks for your answer. I have made some edits to my question, do your response still holds? It seems it does, just wanted to confirm with you.
â€“Â curiousmind
2 hours ago

Yes, it does. I'll leave the caveat in there so that the answer is more widely applicable than just to the case where you have several observations in the "rare" category.
â€“Â jbowman
2 hours ago

Thank you. This answer is helpful.
â€“Â curiousmind
1 hour ago

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky