Can you have interaction terms for both â€œsidesâ€ of a dummy variable in a single regression?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
1
down vote

favorite

I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.

wage = sex+ sex* height + constant + error

My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:

wage = sex+ sex* height + reverse_sex * weight + constant + error

Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!

edited 1 hour ago

Penguin_Knight

9,3731945

asked 1 hour ago

Mike

New contributor

Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
â€“Â EdM
53 mins ago

add a commentÂ |Â

up vote
1
down vote

favorite

wage = sex+ sex* height + constant + error

wage = sex+ sex* height + reverse_sex * weight + constant + error

Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!

edited 1 hour ago

Penguin_Knight

9,3731945

asked 1 hour ago

Mike

New contributor

Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
â€“Â EdM
53 mins ago

add a commentÂ |Â

up vote
1
down vote

favorite

wage = sex+ sex* height + constant + error

wage = sex+ sex* height + reverse_sex * weight + constant + error

Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!

edited 1 hour ago

Penguin_Knight

9,3731945

asked 1 hour ago

Mike

New contributor

wage = sex+ sex* height + constant + error

wage = sex+ sex* height + reverse_sex * weight + constant + error

Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!

interaction categorical-encoding

edited 1 hour ago

Penguin_Knight

9,3731945

asked 1 hour ago

Mike

New contributor

edited 1 hour ago

Penguin_Knight

9,3731945

asked 1 hour ago

Mike

New contributor

edited 1 hour ago

Penguin_Knight

9,3731945

edited 1 hour ago

Penguin_Knight

9,3731945

edited 1 hour ago

Penguin_Knight

9,3731945

asked 1 hour ago

Mike

New contributor

asked 1 hour ago

Mike

asked 1 hour ago

Mike

New contributor

Mike is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
â€“Â EdM
53 mins ago

add a commentÂ |Â

Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
â€“Â EdM
53 mins ago

Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
â€“Â EdM
53 mins ago

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

To simplify the wording let's just call the variables male and female.

The main question aside, this is not a typical test for interaction. By specifying:

$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$

you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

That way, the males have:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

And the females have:

$$wage = beta_0 + beta_2 height + epsilon$$

In your version, the female will only have the constant (intercept), which could likely be a wrong specification.

Back to the question about:

wage = sex+ sex* height + reverse_sex * weight + constant + error

The actual interaction tests should then be:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$

A couple points here. First, male and female are completely collinear so one of them will be omitted:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

For males, these terms remain:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$

For females, these terms remain:

$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$

So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.

Second, this is unnecessarily complicating everything because your proposed model:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

is essentially the same as:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$

The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.

Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?

So, let's just actually show it:

set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
 rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")

The first regression using male is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261 
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***

The second regression using female is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261 
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***

Graphically, the relationship is:

enter image description here

The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.

Now, the second mode uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.

I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course relatively suffer from the same magnitude of penalty.

edited 46 secs ago

answered 1 hour ago

Penguin_Knight

9,3731945

Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
â€“Â Mike
36 mins ago

@Mike, see the edits in the answer.
â€“Â Penguin_Knight
5 mins ago

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Mike is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f376020%2fcan-you-have-interaction-terms-for-both-sides-of-a-dummy-variable-in-a-single%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

To simplify the wording let's just call the variables male and female.

The main question aside, this is not a typical test for interaction. By specifying:

$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$

you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

That way, the males have:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

And the females have:

$$wage = beta_0 + beta_2 height + epsilon$$

In your version, the female will only have the constant (intercept), which could likely be a wrong specification.

Back to the question about:

wage = sex+ sex* height + reverse_sex * weight + constant + error

The actual interaction tests should then be:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$

A couple points here. First, male and female are completely collinear so one of them will be omitted:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

For males, these terms remain:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$

For females, these terms remain:

$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$

So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.

Second, this is unnecessarily complicating everything because your proposed model:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

is essentially the same as:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$

Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?

So, let's just actually show it:

set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
 rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")

The first regression using male is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261 
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***

The second regression using female is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261 
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***

Graphically, the relationship is:

enter image description here

edited 46 secs ago

answered 1 hour ago

Penguin_Knight

9,3731945

Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
â€“Â Mike
36 mins ago

@Mike, see the edits in the answer.
â€“Â Penguin_Knight
5 mins ago

add a commentÂ |Â

up vote
3
down vote

accepted

To simplify the wording let's just call the variables male and female.

The main question aside, this is not a typical test for interaction. By specifying:

$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$

you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

That way, the males have:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

And the females have:

$$wage = beta_0 + beta_2 height + epsilon$$

In your version, the female will only have the constant (intercept), which could likely be a wrong specification.

Back to the question about:

wage = sex+ sex* height + reverse_sex * weight + constant + error

The actual interaction tests should then be:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$

A couple points here. First, male and female are completely collinear so one of them will be omitted:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

For males, these terms remain:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$

For females, these terms remain:

$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$

So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.

Second, this is unnecessarily complicating everything because your proposed model:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

is essentially the same as:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$

Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?

So, let's just actually show it:

set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
 rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")

The first regression using male is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261 
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***

The second regression using female is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261 
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***

Graphically, the relationship is:

enter image description here

edited 46 secs ago

answered 1 hour ago

Penguin_Knight

9,3731945

Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
â€“Â Mike
36 mins ago

@Mike, see the edits in the answer.
â€“Â Penguin_Knight
5 mins ago

add a commentÂ |Â

up vote
3
down vote

accepted

To simplify the wording let's just call the variables male and female.

The main question aside, this is not a typical test for interaction. By specifying:

$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$

you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

That way, the males have:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

And the females have:

$$wage = beta_0 + beta_2 height + epsilon$$

In your version, the female will only have the constant (intercept), which could likely be a wrong specification.

Back to the question about:

wage = sex+ sex* height + reverse_sex * weight + constant + error

The actual interaction tests should then be:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$

A couple points here. First, male and female are completely collinear so one of them will be omitted:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

For males, these terms remain:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$

For females, these terms remain:

$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$

So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.

Second, this is unnecessarily complicating everything because your proposed model:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

is essentially the same as:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$

Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?

So, let's just actually show it:

set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
 rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")

The first regression using male is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261 
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***

The second regression using female is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261 
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***

Graphically, the relationship is:

enter image description here

edited 46 secs ago

answered 1 hour ago

Penguin_Knight

9,3731945

To simplify the wording let's just call the variables male and female.

The main question aside, this is not a typical test for interaction. By specifying:

$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$

you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

That way, the males have:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

And the females have:

$$wage = beta_0 + beta_2 height + epsilon$$

In your version, the female will only have the constant (intercept), which could likely be a wrong specification.

Back to the question about:

wage = sex+ sex* height + reverse_sex * weight + constant + error

The actual interaction tests should then be:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$

A couple points here. First, male and female are completely collinear so one of them will be omitted:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

For males, these terms remain:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$

For females, these terms remain:

$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$

So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.

Second, this is unnecessarily complicating everything because your proposed model:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

is essentially the same as:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$

Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?

So, let's just actually show it:

set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
 rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")

The first regression using male is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261 
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***

The second regression using female is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261 
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***

Graphically, the relationship is:

enter image description here

edited 46 secs ago

answered 1 hour ago

Penguin_Knight

9,3731945

edited 46 secs ago

answered 1 hour ago

Penguin_Knight

9,3731945

answered 1 hour ago

Penguin_Knight

9,3731945

answered 1 hour ago

Penguin_Knight

9,3731945

Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
â€“Â Mike
36 mins ago

@Mike, see the edits in the answer.
â€“Â Penguin_Knight
5 mins ago

add a commentÂ |Â

Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
â€“Â Mike
36 mins ago

@Mike, see the edits in the answer.
â€“Â Penguin_Knight
5 mins ago

Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
â€“Â Mike
36 mins ago

@Mike, see the edits in the answer.
â€“Â Penguin_Knight
5 mins ago

add a commentÂ |Â

Mike is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Mike is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky