Can you have interaction terms for both âsidesâ of a dummy variable in a single regression?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.
wage = sex+ sex* height + constant + error
My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:
wage = sex+ sex* height + reverse_sex * weight + constant + error
Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!
interaction categorical-encoding
New contributor
add a comment |Â
up vote
1
down vote
favorite
I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.
wage = sex+ sex* height + constant + error
My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:
wage = sex+ sex* height + reverse_sex * weight + constant + error
Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!
interaction categorical-encoding
New contributor
Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
â EdM
53 mins ago
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.
wage = sex+ sex* height + constant + error
My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:
wage = sex+ sex* height + reverse_sex * weight + constant + error
Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!
interaction categorical-encoding
New contributor
I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.
wage = sex+ sex* height + constant + error
My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:
wage = sex+ sex* height + reverse_sex * weight + constant + error
Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!
interaction categorical-encoding
interaction categorical-encoding
New contributor
New contributor
edited 1 hour ago
Penguin_Knight
9,3731945
9,3731945
New contributor
asked 1 hour ago
Mike
82
82
New contributor
New contributor
Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
â EdM
53 mins ago
add a comment |Â
Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
â EdM
53 mins ago
Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
â EdM
53 mins ago
Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
â EdM
53 mins ago
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
To simplify the wording let's just call the variables male
and female
.
The main question aside, this is not a typical test for interaction. By specifying:
$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$
you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$
That way, the males have:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$
And the females have:
$$wage = beta_0 + beta_2 height + epsilon$$
In your version, the female will only have the constant (intercept), which could likely be a wrong specification.
Back to the question about:
wage = sex+ sex* height + reverse_sex * weight + constant + error
The actual interaction tests should then be:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$
A couple points here. First, male and female are completely collinear so one of them will be omitted:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$
For males, these terms remain:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$
For females, these terms remain:
$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$
So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.
Second, this is unnecessarily complicating everything because your proposed model:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$
is essentially the same as:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$
The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.
Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?
So, let's just actually show it:
set.seed(81226)
male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)
m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)
m02<- lm(wage ~ female + weight + female*weight)
summary(m02)
plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")
The first regression using male is:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***
The second regression using female is:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***
Graphically, the relationship is:
The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.
Now, the second mode uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.
I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course relatively suffer from the same magnitude of penalty.
Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
â Mike
36 mins ago
@Mike, see the edits in the answer.
â Penguin_Knight
5 mins ago
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
To simplify the wording let's just call the variables male
and female
.
The main question aside, this is not a typical test for interaction. By specifying:
$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$
you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$
That way, the males have:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$
And the females have:
$$wage = beta_0 + beta_2 height + epsilon$$
In your version, the female will only have the constant (intercept), which could likely be a wrong specification.
Back to the question about:
wage = sex+ sex* height + reverse_sex * weight + constant + error
The actual interaction tests should then be:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$
A couple points here. First, male and female are completely collinear so one of them will be omitted:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$
For males, these terms remain:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$
For females, these terms remain:
$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$
So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.
Second, this is unnecessarily complicating everything because your proposed model:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$
is essentially the same as:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$
The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.
Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?
So, let's just actually show it:
set.seed(81226)
male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)
m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)
m02<- lm(wage ~ female + weight + female*weight)
summary(m02)
plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")
The first regression using male is:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***
The second regression using female is:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***
Graphically, the relationship is:
The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.
Now, the second mode uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.
I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course relatively suffer from the same magnitude of penalty.
Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
â Mike
36 mins ago
@Mike, see the edits in the answer.
â Penguin_Knight
5 mins ago
add a comment |Â
up vote
3
down vote
accepted
To simplify the wording let's just call the variables male
and female
.
The main question aside, this is not a typical test for interaction. By specifying:
$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$
you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$
That way, the males have:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$
And the females have:
$$wage = beta_0 + beta_2 height + epsilon$$
In your version, the female will only have the constant (intercept), which could likely be a wrong specification.
Back to the question about:
wage = sex+ sex* height + reverse_sex * weight + constant + error
The actual interaction tests should then be:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$
A couple points here. First, male and female are completely collinear so one of them will be omitted:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$
For males, these terms remain:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$
For females, these terms remain:
$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$
So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.
Second, this is unnecessarily complicating everything because your proposed model:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$
is essentially the same as:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$
The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.
Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?
So, let's just actually show it:
set.seed(81226)
male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)
m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)
m02<- lm(wage ~ female + weight + female*weight)
summary(m02)
plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")
The first regression using male is:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***
The second regression using female is:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***
Graphically, the relationship is:
The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.
Now, the second mode uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.
I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course relatively suffer from the same magnitude of penalty.
Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
â Mike
36 mins ago
@Mike, see the edits in the answer.
â Penguin_Knight
5 mins ago
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
To simplify the wording let's just call the variables male
and female
.
The main question aside, this is not a typical test for interaction. By specifying:
$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$
you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$
That way, the males have:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$
And the females have:
$$wage = beta_0 + beta_2 height + epsilon$$
In your version, the female will only have the constant (intercept), which could likely be a wrong specification.
Back to the question about:
wage = sex+ sex* height + reverse_sex * weight + constant + error
The actual interaction tests should then be:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$
A couple points here. First, male and female are completely collinear so one of them will be omitted:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$
For males, these terms remain:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$
For females, these terms remain:
$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$
So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.
Second, this is unnecessarily complicating everything because your proposed model:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$
is essentially the same as:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$
The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.
Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?
So, let's just actually show it:
set.seed(81226)
male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)
m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)
m02<- lm(wage ~ female + weight + female*weight)
summary(m02)
plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")
The first regression using male is:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***
The second regression using female is:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***
Graphically, the relationship is:
The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.
Now, the second mode uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.
I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course relatively suffer from the same magnitude of penalty.
To simplify the wording let's just call the variables male
and female
.
The main question aside, this is not a typical test for interaction. By specifying:
$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$
you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$
That way, the males have:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$
And the females have:
$$wage = beta_0 + beta_2 height + epsilon$$
In your version, the female will only have the constant (intercept), which could likely be a wrong specification.
Back to the question about:
wage = sex+ sex* height + reverse_sex * weight + constant + error
The actual interaction tests should then be:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$
A couple points here. First, male and female are completely collinear so one of them will be omitted:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$
For males, these terms remain:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$
For females, these terms remain:
$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$
So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.
Second, this is unnecessarily complicating everything because your proposed model:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$
is essentially the same as:
$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$
The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.
Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?
So, let's just actually show it:
set.seed(81226)
male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)
m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)
m02<- lm(wage ~ female + weight + female*weight)
summary(m02)
plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")
The first regression using male is:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***
The second regression using female is:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***
Graphically, the relationship is:
The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.
Now, the second mode uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.
I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course relatively suffer from the same magnitude of penalty.
edited 46 secs ago
answered 1 hour ago
Penguin_Knight
9,3731945
9,3731945
Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
â Mike
36 mins ago
@Mike, see the edits in the answer.
â Penguin_Knight
5 mins ago
add a comment |Â
Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
â Mike
36 mins ago
@Mike, see the edits in the answer.
â Penguin_Knight
5 mins ago
Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
â Mike
36 mins ago
Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
â Mike
36 mins ago
@Mike, see the edits in the answer.
â Penguin_Knight
5 mins ago
@Mike, see the edits in the answer.
â Penguin_Knight
5 mins ago
add a comment |Â
Mike is a new contributor. Be nice, and check out our Code of Conduct.
Mike is a new contributor. Be nice, and check out our Code of Conduct.
Mike is a new contributor. Be nice, and check out our Code of Conduct.
Mike is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f376020%2fcan-you-have-interaction-terms-for-both-sides-of-a-dummy-variable-in-a-single%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
â EdM
53 mins ago