Different regression coefficients in R and Excel

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite
2












Asking a separate question because whilst this has been answered for polynomial regression the solution doesn't work for me.



I'm performing a simple linear regression. Both R and Excel give the same P values, $R^2$ values and F statistics, but they give different coefficients (and therefore equations to put on the graph?)



This is the data:




suva Std dev heather sedge sphagnum
12.880545 0.061156645 0.15 0.525 0
7.098873327 0.026878039 0.2275 0 0
8.660688381 0.04037841 0.425 0.25 0
7.734546932 0.021618446 0.225 0.3875 0
16.70696048 0.103626684 0.15 0.075 0
9.763315183 0.013387158 0.25 0.075 0
12.91735434 0.008076468 0.22 0.22 0
19.94153851 0.150798057 0.0375 0.35 0.225
17.25115559 0.052229596 0.0625 0.2625 0.225
15.38596941 0.05429447 0.1125 0.45 0.025
15.53714185 0.05933884 0.1625 0.525 0.0625
14.11551229 0.064579437 0.1875 0.35 0.1375
14.88575569 0.0189853 0.3375 0.3 0
12.32229733 0.043085602 0.0875 0.1375 0
17.23861185 0.071705699 0.15 0.1375 0
11.50832463 0.1125 0.0875 0.075
14.4810484 0.078476821 0.0375 0.125 0.0625
9.110262652 0.077306938 0.145 0.35 0.0125
10.8571733 0.02681341 0.0375 0.525 0
9.589339421 0.01892435 0.2275 0 0
7.260373588 0.014538237 0.425 0.25 0
11.11099161 0.022802578 0.225 0.3875 0
10.81488848 0.047587818 0.15 0.075 0
8.224131957 0.031126904 0.25 0.075 0
8.818607863 0.002855409 0.22 0.22 0
11.53999863 0.031465613 0.0375 0.35 0.225
14.92784964 0.069998663 0.0625 0.2625 0.225
9.666480932 0.02387741 0.1125 0.45 0.025
12.51000758 0.016960259 0.1625 0.525 0.0625
13.32611463 0.033670382 0.1875 0.35 0.1375
16.76535191 0.029613698 0.3375 0.3 0
11.24615281 0.008440059 0.0875 0.1375 0
10.60564875 0.003930792 0.15 0.1375 0
11.82909125 0.036017582 0.1125 0.0875 0.075
18.2337185 0.143451512 0.0375 0.125 0.0625
10.6226222 0.020561242 0.145 0.35 0.0125




Excel gives the following output:




Regression Statistics



Multiple R = 0.420431825

R Square = 0.176762919

Adjusted R Square = 0.152550064

Standard Error = 3.054032651

Observations = 36



ANOVA



 df SS MS F Significance F


Regression 1 / 68.0914386 / 68.0914386 / 7.300374813 / 0.010676279



Residual 34 / 317.1219248 / 9.327115435



Total 35 / 385.2133634



Coefficients



 Coefficients Standard Er t Stat P-value 


Intercept 14.69082425 / 0.996953237 / 14.7357205 / 2.50758E-16



Heather% -13.78771857 / 5.102932589 / -2.701920579 / 0.010676279



Coefficients con't



Lower95% Upper 95% Lower 95.0% Upper 95.0%



12.66477151 / 16.71687699 / 12.66477151 / 16.71687699



-24.15812531 / -3.41731183 / -24.15812531 / -3.417311838




Whilst R gives the following output:




HS <- lm(heather ~ suva, data=data1)

summary(HS)



Call: lm(formula = heather ~ suva, data = data1)



Residuals:



 Min 1Q Median 3Q Max 


-0.149941 / -0.063070 / -0.009812 / 0.038370 / 0.225804



Coefficients:



 Estimate Std. Error t value Pr(>|t|) 


(Intercept) 0.326633 0.060733 5.378 5.54e-06

suva -0.012820 0.004745 -2.702 0.0107 *



Signif. codes: 0 ‘’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.09313 on 34 degrees of freedom Multiple
R-squared: 0.1768, Adjusted R-squared: 0.1526 F-statistic: 7.3 on
1 and 34 DF, p-value: 0.01068




Why are they different in terms of their coefficients? Which one is correct?







share|cite|improve this question


















  • 2




    Are you sure, absolutely sure, that you have not reversed one model relative to the other?
    – mdewey
    Aug 7 at 16:00










  • Not the question, but heather, sedge and sphagnum look like fractions or proportions to me, not percentages.
    – Nick Cox
    Aug 7 at 16:53
















up vote
1
down vote

favorite
2












Asking a separate question because whilst this has been answered for polynomial regression the solution doesn't work for me.



I'm performing a simple linear regression. Both R and Excel give the same P values, $R^2$ values and F statistics, but they give different coefficients (and therefore equations to put on the graph?)



This is the data:




suva Std dev heather sedge sphagnum
12.880545 0.061156645 0.15 0.525 0
7.098873327 0.026878039 0.2275 0 0
8.660688381 0.04037841 0.425 0.25 0
7.734546932 0.021618446 0.225 0.3875 0
16.70696048 0.103626684 0.15 0.075 0
9.763315183 0.013387158 0.25 0.075 0
12.91735434 0.008076468 0.22 0.22 0
19.94153851 0.150798057 0.0375 0.35 0.225
17.25115559 0.052229596 0.0625 0.2625 0.225
15.38596941 0.05429447 0.1125 0.45 0.025
15.53714185 0.05933884 0.1625 0.525 0.0625
14.11551229 0.064579437 0.1875 0.35 0.1375
14.88575569 0.0189853 0.3375 0.3 0
12.32229733 0.043085602 0.0875 0.1375 0
17.23861185 0.071705699 0.15 0.1375 0
11.50832463 0.1125 0.0875 0.075
14.4810484 0.078476821 0.0375 0.125 0.0625
9.110262652 0.077306938 0.145 0.35 0.0125
10.8571733 0.02681341 0.0375 0.525 0
9.589339421 0.01892435 0.2275 0 0
7.260373588 0.014538237 0.425 0.25 0
11.11099161 0.022802578 0.225 0.3875 0
10.81488848 0.047587818 0.15 0.075 0
8.224131957 0.031126904 0.25 0.075 0
8.818607863 0.002855409 0.22 0.22 0
11.53999863 0.031465613 0.0375 0.35 0.225
14.92784964 0.069998663 0.0625 0.2625 0.225
9.666480932 0.02387741 0.1125 0.45 0.025
12.51000758 0.016960259 0.1625 0.525 0.0625
13.32611463 0.033670382 0.1875 0.35 0.1375
16.76535191 0.029613698 0.3375 0.3 0
11.24615281 0.008440059 0.0875 0.1375 0
10.60564875 0.003930792 0.15 0.1375 0
11.82909125 0.036017582 0.1125 0.0875 0.075
18.2337185 0.143451512 0.0375 0.125 0.0625
10.6226222 0.020561242 0.145 0.35 0.0125




Excel gives the following output:




Regression Statistics



Multiple R = 0.420431825

R Square = 0.176762919

Adjusted R Square = 0.152550064

Standard Error = 3.054032651

Observations = 36



ANOVA



 df SS MS F Significance F


Regression 1 / 68.0914386 / 68.0914386 / 7.300374813 / 0.010676279



Residual 34 / 317.1219248 / 9.327115435



Total 35 / 385.2133634



Coefficients



 Coefficients Standard Er t Stat P-value 


Intercept 14.69082425 / 0.996953237 / 14.7357205 / 2.50758E-16



Heather% -13.78771857 / 5.102932589 / -2.701920579 / 0.010676279



Coefficients con't



Lower95% Upper 95% Lower 95.0% Upper 95.0%



12.66477151 / 16.71687699 / 12.66477151 / 16.71687699



-24.15812531 / -3.41731183 / -24.15812531 / -3.417311838




Whilst R gives the following output:




HS <- lm(heather ~ suva, data=data1)

summary(HS)



Call: lm(formula = heather ~ suva, data = data1)



Residuals:



 Min 1Q Median 3Q Max 


-0.149941 / -0.063070 / -0.009812 / 0.038370 / 0.225804



Coefficients:



 Estimate Std. Error t value Pr(>|t|) 


(Intercept) 0.326633 0.060733 5.378 5.54e-06

suva -0.012820 0.004745 -2.702 0.0107 *



Signif. codes: 0 ‘’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.09313 on 34 degrees of freedom Multiple
R-squared: 0.1768, Adjusted R-squared: 0.1526 F-statistic: 7.3 on
1 and 34 DF, p-value: 0.01068




Why are they different in terms of their coefficients? Which one is correct?







share|cite|improve this question


















  • 2




    Are you sure, absolutely sure, that you have not reversed one model relative to the other?
    – mdewey
    Aug 7 at 16:00










  • Not the question, but heather, sedge and sphagnum look like fractions or proportions to me, not percentages.
    – Nick Cox
    Aug 7 at 16:53












up vote
1
down vote

favorite
2









up vote
1
down vote

favorite
2






2





Asking a separate question because whilst this has been answered for polynomial regression the solution doesn't work for me.



I'm performing a simple linear regression. Both R and Excel give the same P values, $R^2$ values and F statistics, but they give different coefficients (and therefore equations to put on the graph?)



This is the data:




suva Std dev heather sedge sphagnum
12.880545 0.061156645 0.15 0.525 0
7.098873327 0.026878039 0.2275 0 0
8.660688381 0.04037841 0.425 0.25 0
7.734546932 0.021618446 0.225 0.3875 0
16.70696048 0.103626684 0.15 0.075 0
9.763315183 0.013387158 0.25 0.075 0
12.91735434 0.008076468 0.22 0.22 0
19.94153851 0.150798057 0.0375 0.35 0.225
17.25115559 0.052229596 0.0625 0.2625 0.225
15.38596941 0.05429447 0.1125 0.45 0.025
15.53714185 0.05933884 0.1625 0.525 0.0625
14.11551229 0.064579437 0.1875 0.35 0.1375
14.88575569 0.0189853 0.3375 0.3 0
12.32229733 0.043085602 0.0875 0.1375 0
17.23861185 0.071705699 0.15 0.1375 0
11.50832463 0.1125 0.0875 0.075
14.4810484 0.078476821 0.0375 0.125 0.0625
9.110262652 0.077306938 0.145 0.35 0.0125
10.8571733 0.02681341 0.0375 0.525 0
9.589339421 0.01892435 0.2275 0 0
7.260373588 0.014538237 0.425 0.25 0
11.11099161 0.022802578 0.225 0.3875 0
10.81488848 0.047587818 0.15 0.075 0
8.224131957 0.031126904 0.25 0.075 0
8.818607863 0.002855409 0.22 0.22 0
11.53999863 0.031465613 0.0375 0.35 0.225
14.92784964 0.069998663 0.0625 0.2625 0.225
9.666480932 0.02387741 0.1125 0.45 0.025
12.51000758 0.016960259 0.1625 0.525 0.0625
13.32611463 0.033670382 0.1875 0.35 0.1375
16.76535191 0.029613698 0.3375 0.3 0
11.24615281 0.008440059 0.0875 0.1375 0
10.60564875 0.003930792 0.15 0.1375 0
11.82909125 0.036017582 0.1125 0.0875 0.075
18.2337185 0.143451512 0.0375 0.125 0.0625
10.6226222 0.020561242 0.145 0.35 0.0125




Excel gives the following output:




Regression Statistics



Multiple R = 0.420431825

R Square = 0.176762919

Adjusted R Square = 0.152550064

Standard Error = 3.054032651

Observations = 36



ANOVA



 df SS MS F Significance F


Regression 1 / 68.0914386 / 68.0914386 / 7.300374813 / 0.010676279



Residual 34 / 317.1219248 / 9.327115435



Total 35 / 385.2133634



Coefficients



 Coefficients Standard Er t Stat P-value 


Intercept 14.69082425 / 0.996953237 / 14.7357205 / 2.50758E-16



Heather% -13.78771857 / 5.102932589 / -2.701920579 / 0.010676279



Coefficients con't



Lower95% Upper 95% Lower 95.0% Upper 95.0%



12.66477151 / 16.71687699 / 12.66477151 / 16.71687699



-24.15812531 / -3.41731183 / -24.15812531 / -3.417311838




Whilst R gives the following output:




HS <- lm(heather ~ suva, data=data1)

summary(HS)



Call: lm(formula = heather ~ suva, data = data1)



Residuals:



 Min 1Q Median 3Q Max 


-0.149941 / -0.063070 / -0.009812 / 0.038370 / 0.225804



Coefficients:



 Estimate Std. Error t value Pr(>|t|) 


(Intercept) 0.326633 0.060733 5.378 5.54e-06

suva -0.012820 0.004745 -2.702 0.0107 *



Signif. codes: 0 ‘’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.09313 on 34 degrees of freedom Multiple
R-squared: 0.1768, Adjusted R-squared: 0.1526 F-statistic: 7.3 on
1 and 34 DF, p-value: 0.01068




Why are they different in terms of their coefficients? Which one is correct?







share|cite|improve this question














Asking a separate question because whilst this has been answered for polynomial regression the solution doesn't work for me.



I'm performing a simple linear regression. Both R and Excel give the same P values, $R^2$ values and F statistics, but they give different coefficients (and therefore equations to put on the graph?)



This is the data:




suva Std dev heather sedge sphagnum
12.880545 0.061156645 0.15 0.525 0
7.098873327 0.026878039 0.2275 0 0
8.660688381 0.04037841 0.425 0.25 0
7.734546932 0.021618446 0.225 0.3875 0
16.70696048 0.103626684 0.15 0.075 0
9.763315183 0.013387158 0.25 0.075 0
12.91735434 0.008076468 0.22 0.22 0
19.94153851 0.150798057 0.0375 0.35 0.225
17.25115559 0.052229596 0.0625 0.2625 0.225
15.38596941 0.05429447 0.1125 0.45 0.025
15.53714185 0.05933884 0.1625 0.525 0.0625
14.11551229 0.064579437 0.1875 0.35 0.1375
14.88575569 0.0189853 0.3375 0.3 0
12.32229733 0.043085602 0.0875 0.1375 0
17.23861185 0.071705699 0.15 0.1375 0
11.50832463 0.1125 0.0875 0.075
14.4810484 0.078476821 0.0375 0.125 0.0625
9.110262652 0.077306938 0.145 0.35 0.0125
10.8571733 0.02681341 0.0375 0.525 0
9.589339421 0.01892435 0.2275 0 0
7.260373588 0.014538237 0.425 0.25 0
11.11099161 0.022802578 0.225 0.3875 0
10.81488848 0.047587818 0.15 0.075 0
8.224131957 0.031126904 0.25 0.075 0
8.818607863 0.002855409 0.22 0.22 0
11.53999863 0.031465613 0.0375 0.35 0.225
14.92784964 0.069998663 0.0625 0.2625 0.225
9.666480932 0.02387741 0.1125 0.45 0.025
12.51000758 0.016960259 0.1625 0.525 0.0625
13.32611463 0.033670382 0.1875 0.35 0.1375
16.76535191 0.029613698 0.3375 0.3 0
11.24615281 0.008440059 0.0875 0.1375 0
10.60564875 0.003930792 0.15 0.1375 0
11.82909125 0.036017582 0.1125 0.0875 0.075
18.2337185 0.143451512 0.0375 0.125 0.0625
10.6226222 0.020561242 0.145 0.35 0.0125




Excel gives the following output:




Regression Statistics



Multiple R = 0.420431825

R Square = 0.176762919

Adjusted R Square = 0.152550064

Standard Error = 3.054032651

Observations = 36



ANOVA



 df SS MS F Significance F


Regression 1 / 68.0914386 / 68.0914386 / 7.300374813 / 0.010676279



Residual 34 / 317.1219248 / 9.327115435



Total 35 / 385.2133634



Coefficients



 Coefficients Standard Er t Stat P-value 


Intercept 14.69082425 / 0.996953237 / 14.7357205 / 2.50758E-16



Heather% -13.78771857 / 5.102932589 / -2.701920579 / 0.010676279



Coefficients con't



Lower95% Upper 95% Lower 95.0% Upper 95.0%



12.66477151 / 16.71687699 / 12.66477151 / 16.71687699



-24.15812531 / -3.41731183 / -24.15812531 / -3.417311838




Whilst R gives the following output:




HS <- lm(heather ~ suva, data=data1)

summary(HS)



Call: lm(formula = heather ~ suva, data = data1)



Residuals:



 Min 1Q Median 3Q Max 


-0.149941 / -0.063070 / -0.009812 / 0.038370 / 0.225804



Coefficients:



 Estimate Std. Error t value Pr(>|t|) 


(Intercept) 0.326633 0.060733 5.378 5.54e-06

suva -0.012820 0.004745 -2.702 0.0107 *



Signif. codes: 0 ‘’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.09313 on 34 degrees of freedom Multiple
R-squared: 0.1768, Adjusted R-squared: 0.1526 F-statistic: 7.3 on
1 and 34 DF, p-value: 0.01068




Why are they different in terms of their coefficients? Which one is correct?









share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Aug 8 at 6:42









Ferdi

3,29332048




3,29332048










asked Aug 7 at 13:47









Will

61




61







  • 2




    Are you sure, absolutely sure, that you have not reversed one model relative to the other?
    – mdewey
    Aug 7 at 16:00










  • Not the question, but heather, sedge and sphagnum look like fractions or proportions to me, not percentages.
    – Nick Cox
    Aug 7 at 16:53












  • 2




    Are you sure, absolutely sure, that you have not reversed one model relative to the other?
    – mdewey
    Aug 7 at 16:00










  • Not the question, but heather, sedge and sphagnum look like fractions or proportions to me, not percentages.
    – Nick Cox
    Aug 7 at 16:53







2




2




Are you sure, absolutely sure, that you have not reversed one model relative to the other?
– mdewey
Aug 7 at 16:00




Are you sure, absolutely sure, that you have not reversed one model relative to the other?
– mdewey
Aug 7 at 16:00












Not the question, but heather, sedge and sphagnum look like fractions or proportions to me, not percentages.
– Nick Cox
Aug 7 at 16:53




Not the question, but heather, sedge and sphagnum look like fractions or proportions to me, not percentages.
– Nick Cox
Aug 7 at 16:53










1 Answer
1






active

oldest

votes

















up vote
5
down vote













The difference between coefficients is in the relation x versus y which is reversed in the one case.



Note that



  • in your R case the coefficient relates to 'suva'

  • and in your Excel case the coefficient relates to 'heather'.

see in the following code where R can get to both cases:



lm(suva ~ heather, data = as.data.frame(data))

Call:
lm(formula = suva ~ heather, data = as.data.frame(data))

Coefficients:
(Intercept) heather
14.65 -13.60

> lm(heather ~suva, data = as.data.frame(data))

Call:
lm(formula = heather ~ suva, data = as.data.frame(data))

Coefficients:
(Intercept) suva
0.32524 -0.01276


rest of the code:



data <- c(
12.880545, 0.061156645, 0.15 , 0.525, 0,
7.098873327, 0.026878039, 0.2275, 0 ,0,
8.660688381, 0.04037841 , 0.425 , 0.25 , 0,
7.734546932, 0.021618446, 0.225 , 0.3875, 0,
16.70696048, 0.103626684, 0.15 , 0.075, 0,
9.763315183, 0.013387158, 0.25 , 0.075, 0,
12.91735434, 0.008076468, 0.22 , 0.22 , 0,
19.94153851, 0.150798057, 0.0375, 0.35 , 0.225,
17.25115559, 0.052229596, 0.0625, 0.2625, 0.225,
15.38596941, 0.05429447 , 0.1125, 0.45 , 0.025,
15.53714185, 0.05933884 , 0.1625, 0.525, 0.0625,
14.11551229, 0.064579437, 0.1875, 0.35 , 0.1375,
14.88575569, 0.0189853 , 0.3375, 0.3, 0,
12.32229733, 0.043085602, 0.0875, 0.1375, 0,
17.23861185, 0.071705699, 0.15 , 0.1375, 0,
11.50832463, 0.1125 , 0.0875, 0.075, 0,
14.4810484, 0.078476821, 0.0375, 0.125, 0.0625,
9.110262652, 0.077306938, 0.145 , 0.35 , 0.0125,
10.8571733, 0.02681341 , 0.0375, 0.525, 0,
9.589339421, 0.01892435 , 0.2275, 0 , 0,
7.260373588, 0.014538237, 0.425 , 0.25 , 0,
11.11099161, 0.022802578, 0.225 , 0.3875 , 0,
10.81488848, 0.047587818, 0.15 , 0.075 , 0,
8.224131957, 0.031126904, 0.25 , 0.075 , 0,
8.818607863, 0.002855409, 0.22 , 0.22 , 0,
11.53999863, 0.031465613, 0.0375, 0.35 , 0.225,
14.92784964, 0.069998663, 0.0625, 0.2625 , 0.225,
9.666480932, 0.02387741 , 0.1125, 0.45 , 0.025,
12.51000758, 0.016960259, 0.1625, 0.525 , 0.0625,
13.32611463, 0.033670382, 0.1875, 0.35 , 0.1375,
16.76535191, 0.029613698, 0.3375, 0.3 ,0,
11.24615281, 0.008440059, 0.0875, 0.1375, 0,
10.60564875, 0.003930792, 0.15 , 0.1375, 0,
11.82909125, 0.036017582, 0.1125, 0.0875 , 0.075,
18.2337185, 0.143451512, 0.0375, 0.125 , 0.0625,
10.6226222, 0.020561242, 0.145 , 0.35 , 0.0125
)
data <- matrix(data,36, byrow=1)
colnames(data) <- c("suva", "Std dev", "heather", "sedge", "sphagnum")



Why then, is $R^2$ still the same?



There is a certain symmetry in the situation. The regression slope coefficient is (in simple linear regression) the correlation coefficient scaled by the variance of the $x$ and $y$ data.



$$hatbeta_y sim x = r_xy fracs_ys_x$$



The regression model variance is then:



$$s_mod = hatbeta_y sim x s_x = r_xy s_y$$



and the ratio of model variance and variance of the data is:



$$R^2 = left( fracs_mods_y right)^2= r_xy^2$$






share|cite|improve this answer






















  • Thank you so much for this reply. It has helped massively and has made the results now the same.
    – Will
    Aug 14 at 13:48










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f361124%2fdifferent-regression-coefficients-in-r-and-excel%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
5
down vote













The difference between coefficients is in the relation x versus y which is reversed in the one case.



Note that



  • in your R case the coefficient relates to 'suva'

  • and in your Excel case the coefficient relates to 'heather'.

see in the following code where R can get to both cases:



lm(suva ~ heather, data = as.data.frame(data))

Call:
lm(formula = suva ~ heather, data = as.data.frame(data))

Coefficients:
(Intercept) heather
14.65 -13.60

> lm(heather ~suva, data = as.data.frame(data))

Call:
lm(formula = heather ~ suva, data = as.data.frame(data))

Coefficients:
(Intercept) suva
0.32524 -0.01276


rest of the code:



data <- c(
12.880545, 0.061156645, 0.15 , 0.525, 0,
7.098873327, 0.026878039, 0.2275, 0 ,0,
8.660688381, 0.04037841 , 0.425 , 0.25 , 0,
7.734546932, 0.021618446, 0.225 , 0.3875, 0,
16.70696048, 0.103626684, 0.15 , 0.075, 0,
9.763315183, 0.013387158, 0.25 , 0.075, 0,
12.91735434, 0.008076468, 0.22 , 0.22 , 0,
19.94153851, 0.150798057, 0.0375, 0.35 , 0.225,
17.25115559, 0.052229596, 0.0625, 0.2625, 0.225,
15.38596941, 0.05429447 , 0.1125, 0.45 , 0.025,
15.53714185, 0.05933884 , 0.1625, 0.525, 0.0625,
14.11551229, 0.064579437, 0.1875, 0.35 , 0.1375,
14.88575569, 0.0189853 , 0.3375, 0.3, 0,
12.32229733, 0.043085602, 0.0875, 0.1375, 0,
17.23861185, 0.071705699, 0.15 , 0.1375, 0,
11.50832463, 0.1125 , 0.0875, 0.075, 0,
14.4810484, 0.078476821, 0.0375, 0.125, 0.0625,
9.110262652, 0.077306938, 0.145 , 0.35 , 0.0125,
10.8571733, 0.02681341 , 0.0375, 0.525, 0,
9.589339421, 0.01892435 , 0.2275, 0 , 0,
7.260373588, 0.014538237, 0.425 , 0.25 , 0,
11.11099161, 0.022802578, 0.225 , 0.3875 , 0,
10.81488848, 0.047587818, 0.15 , 0.075 , 0,
8.224131957, 0.031126904, 0.25 , 0.075 , 0,
8.818607863, 0.002855409, 0.22 , 0.22 , 0,
11.53999863, 0.031465613, 0.0375, 0.35 , 0.225,
14.92784964, 0.069998663, 0.0625, 0.2625 , 0.225,
9.666480932, 0.02387741 , 0.1125, 0.45 , 0.025,
12.51000758, 0.016960259, 0.1625, 0.525 , 0.0625,
13.32611463, 0.033670382, 0.1875, 0.35 , 0.1375,
16.76535191, 0.029613698, 0.3375, 0.3 ,0,
11.24615281, 0.008440059, 0.0875, 0.1375, 0,
10.60564875, 0.003930792, 0.15 , 0.1375, 0,
11.82909125, 0.036017582, 0.1125, 0.0875 , 0.075,
18.2337185, 0.143451512, 0.0375, 0.125 , 0.0625,
10.6226222, 0.020561242, 0.145 , 0.35 , 0.0125
)
data <- matrix(data,36, byrow=1)
colnames(data) <- c("suva", "Std dev", "heather", "sedge", "sphagnum")



Why then, is $R^2$ still the same?



There is a certain symmetry in the situation. The regression slope coefficient is (in simple linear regression) the correlation coefficient scaled by the variance of the $x$ and $y$ data.



$$hatbeta_y sim x = r_xy fracs_ys_x$$



The regression model variance is then:



$$s_mod = hatbeta_y sim x s_x = r_xy s_y$$



and the ratio of model variance and variance of the data is:



$$R^2 = left( fracs_mods_y right)^2= r_xy^2$$






share|cite|improve this answer






















  • Thank you so much for this reply. It has helped massively and has made the results now the same.
    – Will
    Aug 14 at 13:48














up vote
5
down vote













The difference between coefficients is in the relation x versus y which is reversed in the one case.



Note that



  • in your R case the coefficient relates to 'suva'

  • and in your Excel case the coefficient relates to 'heather'.

see in the following code where R can get to both cases:



lm(suva ~ heather, data = as.data.frame(data))

Call:
lm(formula = suva ~ heather, data = as.data.frame(data))

Coefficients:
(Intercept) heather
14.65 -13.60

> lm(heather ~suva, data = as.data.frame(data))

Call:
lm(formula = heather ~ suva, data = as.data.frame(data))

Coefficients:
(Intercept) suva
0.32524 -0.01276


rest of the code:



data <- c(
12.880545, 0.061156645, 0.15 , 0.525, 0,
7.098873327, 0.026878039, 0.2275, 0 ,0,
8.660688381, 0.04037841 , 0.425 , 0.25 , 0,
7.734546932, 0.021618446, 0.225 , 0.3875, 0,
16.70696048, 0.103626684, 0.15 , 0.075, 0,
9.763315183, 0.013387158, 0.25 , 0.075, 0,
12.91735434, 0.008076468, 0.22 , 0.22 , 0,
19.94153851, 0.150798057, 0.0375, 0.35 , 0.225,
17.25115559, 0.052229596, 0.0625, 0.2625, 0.225,
15.38596941, 0.05429447 , 0.1125, 0.45 , 0.025,
15.53714185, 0.05933884 , 0.1625, 0.525, 0.0625,
14.11551229, 0.064579437, 0.1875, 0.35 , 0.1375,
14.88575569, 0.0189853 , 0.3375, 0.3, 0,
12.32229733, 0.043085602, 0.0875, 0.1375, 0,
17.23861185, 0.071705699, 0.15 , 0.1375, 0,
11.50832463, 0.1125 , 0.0875, 0.075, 0,
14.4810484, 0.078476821, 0.0375, 0.125, 0.0625,
9.110262652, 0.077306938, 0.145 , 0.35 , 0.0125,
10.8571733, 0.02681341 , 0.0375, 0.525, 0,
9.589339421, 0.01892435 , 0.2275, 0 , 0,
7.260373588, 0.014538237, 0.425 , 0.25 , 0,
11.11099161, 0.022802578, 0.225 , 0.3875 , 0,
10.81488848, 0.047587818, 0.15 , 0.075 , 0,
8.224131957, 0.031126904, 0.25 , 0.075 , 0,
8.818607863, 0.002855409, 0.22 , 0.22 , 0,
11.53999863, 0.031465613, 0.0375, 0.35 , 0.225,
14.92784964, 0.069998663, 0.0625, 0.2625 , 0.225,
9.666480932, 0.02387741 , 0.1125, 0.45 , 0.025,
12.51000758, 0.016960259, 0.1625, 0.525 , 0.0625,
13.32611463, 0.033670382, 0.1875, 0.35 , 0.1375,
16.76535191, 0.029613698, 0.3375, 0.3 ,0,
11.24615281, 0.008440059, 0.0875, 0.1375, 0,
10.60564875, 0.003930792, 0.15 , 0.1375, 0,
11.82909125, 0.036017582, 0.1125, 0.0875 , 0.075,
18.2337185, 0.143451512, 0.0375, 0.125 , 0.0625,
10.6226222, 0.020561242, 0.145 , 0.35 , 0.0125
)
data <- matrix(data,36, byrow=1)
colnames(data) <- c("suva", "Std dev", "heather", "sedge", "sphagnum")



Why then, is $R^2$ still the same?



There is a certain symmetry in the situation. The regression slope coefficient is (in simple linear regression) the correlation coefficient scaled by the variance of the $x$ and $y$ data.



$$hatbeta_y sim x = r_xy fracs_ys_x$$



The regression model variance is then:



$$s_mod = hatbeta_y sim x s_x = r_xy s_y$$



and the ratio of model variance and variance of the data is:



$$R^2 = left( fracs_mods_y right)^2= r_xy^2$$






share|cite|improve this answer






















  • Thank you so much for this reply. It has helped massively and has made the results now the same.
    – Will
    Aug 14 at 13:48












up vote
5
down vote










up vote
5
down vote









The difference between coefficients is in the relation x versus y which is reversed in the one case.



Note that



  • in your R case the coefficient relates to 'suva'

  • and in your Excel case the coefficient relates to 'heather'.

see in the following code where R can get to both cases:



lm(suva ~ heather, data = as.data.frame(data))

Call:
lm(formula = suva ~ heather, data = as.data.frame(data))

Coefficients:
(Intercept) heather
14.65 -13.60

> lm(heather ~suva, data = as.data.frame(data))

Call:
lm(formula = heather ~ suva, data = as.data.frame(data))

Coefficients:
(Intercept) suva
0.32524 -0.01276


rest of the code:



data <- c(
12.880545, 0.061156645, 0.15 , 0.525, 0,
7.098873327, 0.026878039, 0.2275, 0 ,0,
8.660688381, 0.04037841 , 0.425 , 0.25 , 0,
7.734546932, 0.021618446, 0.225 , 0.3875, 0,
16.70696048, 0.103626684, 0.15 , 0.075, 0,
9.763315183, 0.013387158, 0.25 , 0.075, 0,
12.91735434, 0.008076468, 0.22 , 0.22 , 0,
19.94153851, 0.150798057, 0.0375, 0.35 , 0.225,
17.25115559, 0.052229596, 0.0625, 0.2625, 0.225,
15.38596941, 0.05429447 , 0.1125, 0.45 , 0.025,
15.53714185, 0.05933884 , 0.1625, 0.525, 0.0625,
14.11551229, 0.064579437, 0.1875, 0.35 , 0.1375,
14.88575569, 0.0189853 , 0.3375, 0.3, 0,
12.32229733, 0.043085602, 0.0875, 0.1375, 0,
17.23861185, 0.071705699, 0.15 , 0.1375, 0,
11.50832463, 0.1125 , 0.0875, 0.075, 0,
14.4810484, 0.078476821, 0.0375, 0.125, 0.0625,
9.110262652, 0.077306938, 0.145 , 0.35 , 0.0125,
10.8571733, 0.02681341 , 0.0375, 0.525, 0,
9.589339421, 0.01892435 , 0.2275, 0 , 0,
7.260373588, 0.014538237, 0.425 , 0.25 , 0,
11.11099161, 0.022802578, 0.225 , 0.3875 , 0,
10.81488848, 0.047587818, 0.15 , 0.075 , 0,
8.224131957, 0.031126904, 0.25 , 0.075 , 0,
8.818607863, 0.002855409, 0.22 , 0.22 , 0,
11.53999863, 0.031465613, 0.0375, 0.35 , 0.225,
14.92784964, 0.069998663, 0.0625, 0.2625 , 0.225,
9.666480932, 0.02387741 , 0.1125, 0.45 , 0.025,
12.51000758, 0.016960259, 0.1625, 0.525 , 0.0625,
13.32611463, 0.033670382, 0.1875, 0.35 , 0.1375,
16.76535191, 0.029613698, 0.3375, 0.3 ,0,
11.24615281, 0.008440059, 0.0875, 0.1375, 0,
10.60564875, 0.003930792, 0.15 , 0.1375, 0,
11.82909125, 0.036017582, 0.1125, 0.0875 , 0.075,
18.2337185, 0.143451512, 0.0375, 0.125 , 0.0625,
10.6226222, 0.020561242, 0.145 , 0.35 , 0.0125
)
data <- matrix(data,36, byrow=1)
colnames(data) <- c("suva", "Std dev", "heather", "sedge", "sphagnum")



Why then, is $R^2$ still the same?



There is a certain symmetry in the situation. The regression slope coefficient is (in simple linear regression) the correlation coefficient scaled by the variance of the $x$ and $y$ data.



$$hatbeta_y sim x = r_xy fracs_ys_x$$



The regression model variance is then:



$$s_mod = hatbeta_y sim x s_x = r_xy s_y$$



and the ratio of model variance and variance of the data is:



$$R^2 = left( fracs_mods_y right)^2= r_xy^2$$






share|cite|improve this answer














The difference between coefficients is in the relation x versus y which is reversed in the one case.



Note that



  • in your R case the coefficient relates to 'suva'

  • and in your Excel case the coefficient relates to 'heather'.

see in the following code where R can get to both cases:



lm(suva ~ heather, data = as.data.frame(data))

Call:
lm(formula = suva ~ heather, data = as.data.frame(data))

Coefficients:
(Intercept) heather
14.65 -13.60

> lm(heather ~suva, data = as.data.frame(data))

Call:
lm(formula = heather ~ suva, data = as.data.frame(data))

Coefficients:
(Intercept) suva
0.32524 -0.01276


rest of the code:



data <- c(
12.880545, 0.061156645, 0.15 , 0.525, 0,
7.098873327, 0.026878039, 0.2275, 0 ,0,
8.660688381, 0.04037841 , 0.425 , 0.25 , 0,
7.734546932, 0.021618446, 0.225 , 0.3875, 0,
16.70696048, 0.103626684, 0.15 , 0.075, 0,
9.763315183, 0.013387158, 0.25 , 0.075, 0,
12.91735434, 0.008076468, 0.22 , 0.22 , 0,
19.94153851, 0.150798057, 0.0375, 0.35 , 0.225,
17.25115559, 0.052229596, 0.0625, 0.2625, 0.225,
15.38596941, 0.05429447 , 0.1125, 0.45 , 0.025,
15.53714185, 0.05933884 , 0.1625, 0.525, 0.0625,
14.11551229, 0.064579437, 0.1875, 0.35 , 0.1375,
14.88575569, 0.0189853 , 0.3375, 0.3, 0,
12.32229733, 0.043085602, 0.0875, 0.1375, 0,
17.23861185, 0.071705699, 0.15 , 0.1375, 0,
11.50832463, 0.1125 , 0.0875, 0.075, 0,
14.4810484, 0.078476821, 0.0375, 0.125, 0.0625,
9.110262652, 0.077306938, 0.145 , 0.35 , 0.0125,
10.8571733, 0.02681341 , 0.0375, 0.525, 0,
9.589339421, 0.01892435 , 0.2275, 0 , 0,
7.260373588, 0.014538237, 0.425 , 0.25 , 0,
11.11099161, 0.022802578, 0.225 , 0.3875 , 0,
10.81488848, 0.047587818, 0.15 , 0.075 , 0,
8.224131957, 0.031126904, 0.25 , 0.075 , 0,
8.818607863, 0.002855409, 0.22 , 0.22 , 0,
11.53999863, 0.031465613, 0.0375, 0.35 , 0.225,
14.92784964, 0.069998663, 0.0625, 0.2625 , 0.225,
9.666480932, 0.02387741 , 0.1125, 0.45 , 0.025,
12.51000758, 0.016960259, 0.1625, 0.525 , 0.0625,
13.32611463, 0.033670382, 0.1875, 0.35 , 0.1375,
16.76535191, 0.029613698, 0.3375, 0.3 ,0,
11.24615281, 0.008440059, 0.0875, 0.1375, 0,
10.60564875, 0.003930792, 0.15 , 0.1375, 0,
11.82909125, 0.036017582, 0.1125, 0.0875 , 0.075,
18.2337185, 0.143451512, 0.0375, 0.125 , 0.0625,
10.6226222, 0.020561242, 0.145 , 0.35 , 0.0125
)
data <- matrix(data,36, byrow=1)
colnames(data) <- c("suva", "Std dev", "heather", "sedge", "sphagnum")



Why then, is $R^2$ still the same?



There is a certain symmetry in the situation. The regression slope coefficient is (in simple linear regression) the correlation coefficient scaled by the variance of the $x$ and $y$ data.



$$hatbeta_y sim x = r_xy fracs_ys_x$$



The regression model variance is then:



$$s_mod = hatbeta_y sim x s_x = r_xy s_y$$



and the ratio of model variance and variance of the data is:



$$R^2 = left( fracs_mods_y right)^2= r_xy^2$$







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Aug 7 at 17:10

























answered Aug 7 at 14:05









Martijn Weterings

9,0001148




9,0001148











  • Thank you so much for this reply. It has helped massively and has made the results now the same.
    – Will
    Aug 14 at 13:48
















  • Thank you so much for this reply. It has helped massively and has made the results now the same.
    – Will
    Aug 14 at 13:48















Thank you so much for this reply. It has helped massively and has made the results now the same.
– Will
Aug 14 at 13:48




Thank you so much for this reply. It has helped massively and has made the results now the same.
– Will
Aug 14 at 13:48












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f361124%2fdifferent-regression-coefficients-in-r-and-excel%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery