What is the need of assumptions in linear regression?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
14
down vote
favorite
In linear regression, we make the following assumptions
$E(Y_i)$, at each set of values of the predictors, $(x_1i, x_2i,…)$, is a Linear function of the predictors.
One of the ways we can solve linear regression is through normal equations, which we can write as
$$theta = (X^TX)^-1X^TY$$
From a mathematical standpoint, the above equation only needs $X^TX$ to be invertible. So, why do we need these assumptions? I asked a few colleagues and they mentioned that it is to get good results and normal equations are an algorithm to achieve that. But in that case, how do these assumptions help? How does upholding them help in getting a better model?
regression assumptions
 |Â
show 1 more comment
up vote
14
down vote
favorite
In linear regression, we make the following assumptions
$E(Y_i)$, at each set of values of the predictors, $(x_1i, x_2i,…)$, is a Linear function of the predictors.
One of the ways we can solve linear regression is through normal equations, which we can write as
$$theta = (X^TX)^-1X^TY$$
From a mathematical standpoint, the above equation only needs $X^TX$ to be invertible. So, why do we need these assumptions? I asked a few colleagues and they mentioned that it is to get good results and normal equations are an algorithm to achieve that. But in that case, how do these assumptions help? How does upholding them help in getting a better model?
regression assumptions
2
Normal distribution is needed to calculate coefficient confidence intervals using usual formulas. Other formulas of CI calculation (I think it was White) allow non-normal distribution.
– keiv.fly
Aug 15 at 8:30
You do not always need those assumptions for the model to work. In neural networks you have linear regressions inside and they minimize rmse just like the formula you provided, but most likely none of the assumptions hold. No normal distribution, no equal variance, no linear function, even the errors can be dependent.
– keiv.fly
Aug 15 at 8:43
2
See stats.stackexchange.com/q/16381/35989
– Tim♦
Aug 15 at 12:14
1
@Alexis The independent variables being i.i.d. definitely isn't an assumption (and the dependent variable being iid also isn't an assumption - imagine if we assumed that the response was iid then it would be pointless to do anything beyond estimating the mean). And the "no omitted variables" isn't really an additional assumption although it's good to avoid omitting variables - the first assumption listed is really what takes care of that.
– Dason
Aug 15 at 14:57
1
@Dason I think my link provides a pretty strong example of "no omitted variables" being requisite for valid interpretation. I also think i.i.d. (conditional on the predictors, yes) is necessary, with random walks providing an excellent example of where non-i.i.d. estimation can fail (ever resorting to estimating only the mean).
– Alexis
Aug 15 at 17:02
 |Â
show 1 more comment
up vote
14
down vote
favorite
up vote
14
down vote
favorite
In linear regression, we make the following assumptions
$E(Y_i)$, at each set of values of the predictors, $(x_1i, x_2i,…)$, is a Linear function of the predictors.
One of the ways we can solve linear regression is through normal equations, which we can write as
$$theta = (X^TX)^-1X^TY$$
From a mathematical standpoint, the above equation only needs $X^TX$ to be invertible. So, why do we need these assumptions? I asked a few colleagues and they mentioned that it is to get good results and normal equations are an algorithm to achieve that. But in that case, how do these assumptions help? How does upholding them help in getting a better model?
regression assumptions
In linear regression, we make the following assumptions
$E(Y_i)$, at each set of values of the predictors, $(x_1i, x_2i,…)$, is a Linear function of the predictors.
One of the ways we can solve linear regression is through normal equations, which we can write as
$$theta = (X^TX)^-1X^TY$$
From a mathematical standpoint, the above equation only needs $X^TX$ to be invertible. So, why do we need these assumptions? I asked a few colleagues and they mentioned that it is to get good results and normal equations are an algorithm to achieve that. But in that case, how do these assumptions help? How does upholding them help in getting a better model?
regression assumptions
edited Aug 15 at 12:40


conjectures
3,0191430
3,0191430
asked Aug 15 at 7:05


Clock Slave
264113
264113
2
Normal distribution is needed to calculate coefficient confidence intervals using usual formulas. Other formulas of CI calculation (I think it was White) allow non-normal distribution.
– keiv.fly
Aug 15 at 8:30
You do not always need those assumptions for the model to work. In neural networks you have linear regressions inside and they minimize rmse just like the formula you provided, but most likely none of the assumptions hold. No normal distribution, no equal variance, no linear function, even the errors can be dependent.
– keiv.fly
Aug 15 at 8:43
2
See stats.stackexchange.com/q/16381/35989
– Tim♦
Aug 15 at 12:14
1
@Alexis The independent variables being i.i.d. definitely isn't an assumption (and the dependent variable being iid also isn't an assumption - imagine if we assumed that the response was iid then it would be pointless to do anything beyond estimating the mean). And the "no omitted variables" isn't really an additional assumption although it's good to avoid omitting variables - the first assumption listed is really what takes care of that.
– Dason
Aug 15 at 14:57
1
@Dason I think my link provides a pretty strong example of "no omitted variables" being requisite for valid interpretation. I also think i.i.d. (conditional on the predictors, yes) is necessary, with random walks providing an excellent example of where non-i.i.d. estimation can fail (ever resorting to estimating only the mean).
– Alexis
Aug 15 at 17:02
 |Â
show 1 more comment
2
Normal distribution is needed to calculate coefficient confidence intervals using usual formulas. Other formulas of CI calculation (I think it was White) allow non-normal distribution.
– keiv.fly
Aug 15 at 8:30
You do not always need those assumptions for the model to work. In neural networks you have linear regressions inside and they minimize rmse just like the formula you provided, but most likely none of the assumptions hold. No normal distribution, no equal variance, no linear function, even the errors can be dependent.
– keiv.fly
Aug 15 at 8:43
2
See stats.stackexchange.com/q/16381/35989
– Tim♦
Aug 15 at 12:14
1
@Alexis The independent variables being i.i.d. definitely isn't an assumption (and the dependent variable being iid also isn't an assumption - imagine if we assumed that the response was iid then it would be pointless to do anything beyond estimating the mean). And the "no omitted variables" isn't really an additional assumption although it's good to avoid omitting variables - the first assumption listed is really what takes care of that.
– Dason
Aug 15 at 14:57
1
@Dason I think my link provides a pretty strong example of "no omitted variables" being requisite for valid interpretation. I also think i.i.d. (conditional on the predictors, yes) is necessary, with random walks providing an excellent example of where non-i.i.d. estimation can fail (ever resorting to estimating only the mean).
– Alexis
Aug 15 at 17:02
2
2
Normal distribution is needed to calculate coefficient confidence intervals using usual formulas. Other formulas of CI calculation (I think it was White) allow non-normal distribution.
– keiv.fly
Aug 15 at 8:30
Normal distribution is needed to calculate coefficient confidence intervals using usual formulas. Other formulas of CI calculation (I think it was White) allow non-normal distribution.
– keiv.fly
Aug 15 at 8:30
You do not always need those assumptions for the model to work. In neural networks you have linear regressions inside and they minimize rmse just like the formula you provided, but most likely none of the assumptions hold. No normal distribution, no equal variance, no linear function, even the errors can be dependent.
– keiv.fly
Aug 15 at 8:43
You do not always need those assumptions for the model to work. In neural networks you have linear regressions inside and they minimize rmse just like the formula you provided, but most likely none of the assumptions hold. No normal distribution, no equal variance, no linear function, even the errors can be dependent.
– keiv.fly
Aug 15 at 8:43
2
2
See stats.stackexchange.com/q/16381/35989
– Tim♦
Aug 15 at 12:14
See stats.stackexchange.com/q/16381/35989
– Tim♦
Aug 15 at 12:14
1
1
@Alexis The independent variables being i.i.d. definitely isn't an assumption (and the dependent variable being iid also isn't an assumption - imagine if we assumed that the response was iid then it would be pointless to do anything beyond estimating the mean). And the "no omitted variables" isn't really an additional assumption although it's good to avoid omitting variables - the first assumption listed is really what takes care of that.
– Dason
Aug 15 at 14:57
@Alexis The independent variables being i.i.d. definitely isn't an assumption (and the dependent variable being iid also isn't an assumption - imagine if we assumed that the response was iid then it would be pointless to do anything beyond estimating the mean). And the "no omitted variables" isn't really an additional assumption although it's good to avoid omitting variables - the first assumption listed is really what takes care of that.
– Dason
Aug 15 at 14:57
1
1
@Dason I think my link provides a pretty strong example of "no omitted variables" being requisite for valid interpretation. I also think i.i.d. (conditional on the predictors, yes) is necessary, with random walks providing an excellent example of where non-i.i.d. estimation can fail (ever resorting to estimating only the mean).
– Alexis
Aug 15 at 17:02
@Dason I think my link provides a pretty strong example of "no omitted variables" being requisite for valid interpretation. I also think i.i.d. (conditional on the predictors, yes) is necessary, with random walks providing an excellent example of where non-i.i.d. estimation can fail (ever resorting to estimating only the mean).
– Alexis
Aug 15 at 17:02
 |Â
show 1 more comment
5 Answers
5
active
oldest
votes
up vote
19
down vote
accepted
You are correct - you do not need to satisfy these assumptions to fit a least squares line to the points. You need these assumptions to interpret the results. For example, assuming there was no relationship between an input $X_1$ and $Y$, what is the probability of getting a coefficient $beta_1$ at least as great as what we saw from the regression?
add a comment |Â
up vote
17
down vote
Try the image of Anscombe's quartet from Wikipedia to get an idea of some of the potential issues with interpreting linear regression when some of those assumptions are clearly false: most of the basic descriptive statistics are the same in all four (and the individual $x_i$ values are identical in all but the bottom right)
I made a illustration following Anscombe showing what violating the no omitted variables assumption can look like. Still working on an Anscombe-like illustration of a violation of the i.i.d. assumption.
– Alexis
Aug 15 at 14:19
add a comment |Â
up vote
3
down vote
You don't need those assumptions to fit a linear model. However, your parameter estimates could be biased or not having the minimum variance. Violating the assumptions will make yourself more difficult in interpreting the regression results, for example, constructing a confidence interval.
add a comment |Â
up vote
1
down vote
Ok, the answers so far go like this: If we violate the assumptions then bad things can happen. I believe that the interesting direction is: When all assumptions that we need (actually a little different from the ones above) are met, why and how can we be sure that linear regression is the best model?
I think the answer to that question goes like this: If we make the assumptions as in the answer of this question then we can compute the conditional density $p(y_i|x_i)$. From this we can compute $E[Y_i|X_i=x_i]$ (the factorization of the conditional expectation at $x_i$) and see that it is indeed the linear regression function. Then we use this in order to see that this is the best function with respect to the true risk.
add a comment |Â
up vote
0
down vote
The two key assumptions are
- Independence of observations
- Mean is not related to the variance
See The discussion in Julian Faraway's book.
If these are both true, OLS is surprisingly resistant to breaches in the other assumptions you have listed.
add a comment |Â
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
19
down vote
accepted
You are correct - you do not need to satisfy these assumptions to fit a least squares line to the points. You need these assumptions to interpret the results. For example, assuming there was no relationship between an input $X_1$ and $Y$, what is the probability of getting a coefficient $beta_1$ at least as great as what we saw from the regression?
add a comment |Â
up vote
19
down vote
accepted
You are correct - you do not need to satisfy these assumptions to fit a least squares line to the points. You need these assumptions to interpret the results. For example, assuming there was no relationship between an input $X_1$ and $Y$, what is the probability of getting a coefficient $beta_1$ at least as great as what we saw from the regression?
add a comment |Â
up vote
19
down vote
accepted
up vote
19
down vote
accepted
You are correct - you do not need to satisfy these assumptions to fit a least squares line to the points. You need these assumptions to interpret the results. For example, assuming there was no relationship between an input $X_1$ and $Y$, what is the probability of getting a coefficient $beta_1$ at least as great as what we saw from the regression?
You are correct - you do not need to satisfy these assumptions to fit a least squares line to the points. You need these assumptions to interpret the results. For example, assuming there was no relationship between an input $X_1$ and $Y$, what is the probability of getting a coefficient $beta_1$ at least as great as what we saw from the regression?
answered Aug 15 at 7:20
rinspy
1,850330
1,850330
add a comment |Â
add a comment |Â
up vote
17
down vote
Try the image of Anscombe's quartet from Wikipedia to get an idea of some of the potential issues with interpreting linear regression when some of those assumptions are clearly false: most of the basic descriptive statistics are the same in all four (and the individual $x_i$ values are identical in all but the bottom right)
I made a illustration following Anscombe showing what violating the no omitted variables assumption can look like. Still working on an Anscombe-like illustration of a violation of the i.i.d. assumption.
– Alexis
Aug 15 at 14:19
add a comment |Â
up vote
17
down vote
Try the image of Anscombe's quartet from Wikipedia to get an idea of some of the potential issues with interpreting linear regression when some of those assumptions are clearly false: most of the basic descriptive statistics are the same in all four (and the individual $x_i$ values are identical in all but the bottom right)
I made a illustration following Anscombe showing what violating the no omitted variables assumption can look like. Still working on an Anscombe-like illustration of a violation of the i.i.d. assumption.
– Alexis
Aug 15 at 14:19
add a comment |Â
up vote
17
down vote
up vote
17
down vote
Try the image of Anscombe's quartet from Wikipedia to get an idea of some of the potential issues with interpreting linear regression when some of those assumptions are clearly false: most of the basic descriptive statistics are the same in all four (and the individual $x_i$ values are identical in all but the bottom right)
Try the image of Anscombe's quartet from Wikipedia to get an idea of some of the potential issues with interpreting linear regression when some of those assumptions are clearly false: most of the basic descriptive statistics are the same in all four (and the individual $x_i$ values are identical in all but the bottom right)
answered Aug 15 at 11:41
Henry
17.5k13164
17.5k13164
I made a illustration following Anscombe showing what violating the no omitted variables assumption can look like. Still working on an Anscombe-like illustration of a violation of the i.i.d. assumption.
– Alexis
Aug 15 at 14:19
add a comment |Â
I made a illustration following Anscombe showing what violating the no omitted variables assumption can look like. Still working on an Anscombe-like illustration of a violation of the i.i.d. assumption.
– Alexis
Aug 15 at 14:19
I made a illustration following Anscombe showing what violating the no omitted variables assumption can look like. Still working on an Anscombe-like illustration of a violation of the i.i.d. assumption.
– Alexis
Aug 15 at 14:19
I made a illustration following Anscombe showing what violating the no omitted variables assumption can look like. Still working on an Anscombe-like illustration of a violation of the i.i.d. assumption.
– Alexis
Aug 15 at 14:19
add a comment |Â
up vote
3
down vote
You don't need those assumptions to fit a linear model. However, your parameter estimates could be biased or not having the minimum variance. Violating the assumptions will make yourself more difficult in interpreting the regression results, for example, constructing a confidence interval.
add a comment |Â
up vote
3
down vote
You don't need those assumptions to fit a linear model. However, your parameter estimates could be biased or not having the minimum variance. Violating the assumptions will make yourself more difficult in interpreting the regression results, for example, constructing a confidence interval.
add a comment |Â
up vote
3
down vote
up vote
3
down vote
You don't need those assumptions to fit a linear model. However, your parameter estimates could be biased or not having the minimum variance. Violating the assumptions will make yourself more difficult in interpreting the regression results, for example, constructing a confidence interval.
You don't need those assumptions to fit a linear model. However, your parameter estimates could be biased or not having the minimum variance. Violating the assumptions will make yourself more difficult in interpreting the regression results, for example, constructing a confidence interval.
answered Aug 15 at 8:29


SmallChess
5,32341837
5,32341837
add a comment |Â
add a comment |Â
up vote
1
down vote
Ok, the answers so far go like this: If we violate the assumptions then bad things can happen. I believe that the interesting direction is: When all assumptions that we need (actually a little different from the ones above) are met, why and how can we be sure that linear regression is the best model?
I think the answer to that question goes like this: If we make the assumptions as in the answer of this question then we can compute the conditional density $p(y_i|x_i)$. From this we can compute $E[Y_i|X_i=x_i]$ (the factorization of the conditional expectation at $x_i$) and see that it is indeed the linear regression function. Then we use this in order to see that this is the best function with respect to the true risk.
add a comment |Â
up vote
1
down vote
Ok, the answers so far go like this: If we violate the assumptions then bad things can happen. I believe that the interesting direction is: When all assumptions that we need (actually a little different from the ones above) are met, why and how can we be sure that linear regression is the best model?
I think the answer to that question goes like this: If we make the assumptions as in the answer of this question then we can compute the conditional density $p(y_i|x_i)$. From this we can compute $E[Y_i|X_i=x_i]$ (the factorization of the conditional expectation at $x_i$) and see that it is indeed the linear regression function. Then we use this in order to see that this is the best function with respect to the true risk.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Ok, the answers so far go like this: If we violate the assumptions then bad things can happen. I believe that the interesting direction is: When all assumptions that we need (actually a little different from the ones above) are met, why and how can we be sure that linear regression is the best model?
I think the answer to that question goes like this: If we make the assumptions as in the answer of this question then we can compute the conditional density $p(y_i|x_i)$. From this we can compute $E[Y_i|X_i=x_i]$ (the factorization of the conditional expectation at $x_i$) and see that it is indeed the linear regression function. Then we use this in order to see that this is the best function with respect to the true risk.
Ok, the answers so far go like this: If we violate the assumptions then bad things can happen. I believe that the interesting direction is: When all assumptions that we need (actually a little different from the ones above) are met, why and how can we be sure that linear regression is the best model?
I think the answer to that question goes like this: If we make the assumptions as in the answer of this question then we can compute the conditional density $p(y_i|x_i)$. From this we can compute $E[Y_i|X_i=x_i]$ (the factorization of the conditional expectation at $x_i$) and see that it is indeed the linear regression function. Then we use this in order to see that this is the best function with respect to the true risk.
answered Aug 15 at 11:41
Fabian Werner
1,081512
1,081512
add a comment |Â
add a comment |Â
up vote
0
down vote
The two key assumptions are
- Independence of observations
- Mean is not related to the variance
See The discussion in Julian Faraway's book.
If these are both true, OLS is surprisingly resistant to breaches in the other assumptions you have listed.
add a comment |Â
up vote
0
down vote
The two key assumptions are
- Independence of observations
- Mean is not related to the variance
See The discussion in Julian Faraway's book.
If these are both true, OLS is surprisingly resistant to breaches in the other assumptions you have listed.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
The two key assumptions are
- Independence of observations
- Mean is not related to the variance
See The discussion in Julian Faraway's book.
If these are both true, OLS is surprisingly resistant to breaches in the other assumptions you have listed.
The two key assumptions are
- Independence of observations
- Mean is not related to the variance
See The discussion in Julian Faraway's book.
If these are both true, OLS is surprisingly resistant to breaches in the other assumptions you have listed.
answered Aug 15 at 23:55
astaines
30625
30625
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f362284%2fwhat-is-the-need-of-assumptions-in-linear-regression%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
Normal distribution is needed to calculate coefficient confidence intervals using usual formulas. Other formulas of CI calculation (I think it was White) allow non-normal distribution.
– keiv.fly
Aug 15 at 8:30
You do not always need those assumptions for the model to work. In neural networks you have linear regressions inside and they minimize rmse just like the formula you provided, but most likely none of the assumptions hold. No normal distribution, no equal variance, no linear function, even the errors can be dependent.
– keiv.fly
Aug 15 at 8:43
2
See stats.stackexchange.com/q/16381/35989
– Tim♦
Aug 15 at 12:14
1
@Alexis The independent variables being i.i.d. definitely isn't an assumption (and the dependent variable being iid also isn't an assumption - imagine if we assumed that the response was iid then it would be pointless to do anything beyond estimating the mean). And the "no omitted variables" isn't really an additional assumption although it's good to avoid omitting variables - the first assumption listed is really what takes care of that.
– Dason
Aug 15 at 14:57
1
@Dason I think my link provides a pretty strong example of "no omitted variables" being requisite for valid interpretation. I also think i.i.d. (conditional on the predictors, yes) is necessary, with random walks providing an excellent example of where non-i.i.d. estimation can fail (ever resorting to estimating only the mean).
– Alexis
Aug 15 at 17:02