Can you have interaction terms for both “sides” of a dummy variable in a single regression?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite












I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.



wage = sex+ sex* height + constant + error



My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:



wage = sex+ sex* height + reverse_sex * weight + constant + error



Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!










share|cite|improve this question









New contributor




Mike is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
    – EdM
    53 mins ago
















up vote
1
down vote

favorite












I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.



wage = sex+ sex* height + constant + error



My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:



wage = sex+ sex* height + reverse_sex * weight + constant + error



Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!










share|cite|improve this question









New contributor




Mike is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
    – EdM
    53 mins ago












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.



wage = sex+ sex* height + constant + error



My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:



wage = sex+ sex* height + reverse_sex * weight + constant + error



Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!










share|cite|improve this question









New contributor




Mike is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.



wage = sex+ sex* height + constant + error



My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:



wage = sex+ sex* height + reverse_sex * weight + constant + error



Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!







interaction categorical-encoding






share|cite|improve this question









New contributor




Mike is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|cite|improve this question









New contributor




Mike is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|cite|improve this question




share|cite|improve this question








edited 1 hour ago









Penguin_Knight

9,3731945




9,3731945






New contributor




Mike is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 1 hour ago









Mike

82




82




New contributor




Mike is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Mike is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Mike is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
    – EdM
    53 mins ago
















  • Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
    – EdM
    53 mins ago















Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
– EdM
53 mins ago




Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
– EdM
53 mins ago










1 Answer
1






active

oldest

votes

















up vote
3
down vote



accepted










To simplify the wording let's just call the variables male and female.



The main question aside, this is not a typical test for interaction. By specifying:



$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$



you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



That way, the males have:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



And the females have:



$$wage = beta_0 + beta_2 height + epsilon$$



In your version, the female will only have the constant (intercept), which could likely be a wrong specification.




Back to the question about:




wage = sex+ sex* height + reverse_sex * weight + constant + error




The actual interaction tests should then be:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$



A couple points here. First, male and female are completely collinear so one of them will be omitted:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



For males, these terms remain:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$



For females, these terms remain:



$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$



So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.



Second, this is unnecessarily complicating everything because your proposed model:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



is essentially the same as:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$



The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.





Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?




So, let's just actually show it:



set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$
fitted[male==1], col="red")


The first regression using male is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***


The second regression using female is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***


Graphically, the relationship is:



enter image description here



The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.



Now, the second mode uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.



I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course relatively suffer from the same magnitude of penalty.






share|cite|improve this answer






















  • Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
    – Mike
    36 mins ago










  • @Mike, see the edits in the answer.
    – Penguin_Knight
    5 mins ago










Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






Mike is a new contributor. Be nice, and check out our Code of Conduct.









 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f376020%2fcan-you-have-interaction-terms-for-both-sides-of-a-dummy-variable-in-a-single%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
3
down vote



accepted










To simplify the wording let's just call the variables male and female.



The main question aside, this is not a typical test for interaction. By specifying:



$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$



you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



That way, the males have:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



And the females have:



$$wage = beta_0 + beta_2 height + epsilon$$



In your version, the female will only have the constant (intercept), which could likely be a wrong specification.




Back to the question about:




wage = sex+ sex* height + reverse_sex * weight + constant + error




The actual interaction tests should then be:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$



A couple points here. First, male and female are completely collinear so one of them will be omitted:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



For males, these terms remain:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$



For females, these terms remain:



$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$



So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.



Second, this is unnecessarily complicating everything because your proposed model:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



is essentially the same as:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$



The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.





Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?




So, let's just actually show it:



set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$
fitted[male==1], col="red")


The first regression using male is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***


The second regression using female is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***


Graphically, the relationship is:



enter image description here



The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.



Now, the second mode uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.



I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course relatively suffer from the same magnitude of penalty.






share|cite|improve this answer






















  • Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
    – Mike
    36 mins ago










  • @Mike, see the edits in the answer.
    – Penguin_Knight
    5 mins ago














up vote
3
down vote



accepted










To simplify the wording let's just call the variables male and female.



The main question aside, this is not a typical test for interaction. By specifying:



$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$



you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



That way, the males have:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



And the females have:



$$wage = beta_0 + beta_2 height + epsilon$$



In your version, the female will only have the constant (intercept), which could likely be a wrong specification.




Back to the question about:




wage = sex+ sex* height + reverse_sex * weight + constant + error




The actual interaction tests should then be:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$



A couple points here. First, male and female are completely collinear so one of them will be omitted:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



For males, these terms remain:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$



For females, these terms remain:



$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$



So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.



Second, this is unnecessarily complicating everything because your proposed model:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



is essentially the same as:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$



The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.





Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?




So, let's just actually show it:



set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$
fitted[male==1], col="red")


The first regression using male is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***


The second regression using female is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***


Graphically, the relationship is:



enter image description here



The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.



Now, the second mode uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.



I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course relatively suffer from the same magnitude of penalty.






share|cite|improve this answer






















  • Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
    – Mike
    36 mins ago










  • @Mike, see the edits in the answer.
    – Penguin_Knight
    5 mins ago












up vote
3
down vote



accepted







up vote
3
down vote



accepted






To simplify the wording let's just call the variables male and female.



The main question aside, this is not a typical test for interaction. By specifying:



$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$



you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



That way, the males have:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



And the females have:



$$wage = beta_0 + beta_2 height + epsilon$$



In your version, the female will only have the constant (intercept), which could likely be a wrong specification.




Back to the question about:




wage = sex+ sex* height + reverse_sex * weight + constant + error




The actual interaction tests should then be:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$



A couple points here. First, male and female are completely collinear so one of them will be omitted:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



For males, these terms remain:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$



For females, these terms remain:



$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$



So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.



Second, this is unnecessarily complicating everything because your proposed model:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



is essentially the same as:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$



The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.





Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?




So, let's just actually show it:



set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$
fitted[male==1], col="red")


The first regression using male is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***


The second regression using female is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***


Graphically, the relationship is:



enter image description here



The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.



Now, the second mode uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.



I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course relatively suffer from the same magnitude of penalty.






share|cite|improve this answer














To simplify the wording let's just call the variables male and female.



The main question aside, this is not a typical test for interaction. By specifying:



$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$



you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



That way, the males have:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



And the females have:



$$wage = beta_0 + beta_2 height + epsilon$$



In your version, the female will only have the constant (intercept), which could likely be a wrong specification.




Back to the question about:




wage = sex+ sex* height + reverse_sex * weight + constant + error




The actual interaction tests should then be:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$



A couple points here. First, male and female are completely collinear so one of them will be omitted:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



For males, these terms remain:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$



For females, these terms remain:



$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$



So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.



Second, this is unnecessarily complicating everything because your proposed model:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



is essentially the same as:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$



The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.





Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?




So, let's just actually show it:



set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$
fitted[male==1], col="red")


The first regression using male is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***


The second regression using female is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***


Graphically, the relationship is:



enter image description here



The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.



Now, the second mode uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.



I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course relatively suffer from the same magnitude of penalty.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited 46 secs ago

























answered 1 hour ago









Penguin_Knight

9,3731945




9,3731945











  • Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
    – Mike
    36 mins ago










  • @Mike, see the edits in the answer.
    – Penguin_Knight
    5 mins ago
















  • Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
    – Mike
    36 mins ago










  • @Mike, see the edits in the answer.
    – Penguin_Knight
    5 mins ago















Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
– Mike
36 mins ago




Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
– Mike
36 mins ago












@Mike, see the edits in the answer.
– Penguin_Knight
5 mins ago




@Mike, see the edits in the answer.
– Penguin_Knight
5 mins ago










Mike is a new contributor. Be nice, and check out our Code of Conduct.









 

draft saved


draft discarded


















Mike is a new contributor. Be nice, and check out our Code of Conduct.












Mike is a new contributor. Be nice, and check out our Code of Conduct.











Mike is a new contributor. Be nice, and check out our Code of Conduct.













 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f376020%2fcan-you-have-interaction-terms-for-both-sides-of-a-dummy-variable-in-a-single%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

Long meetings (6-7 hours a day): Being “babysat” by supervisor

Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

Confectionery