Regression when both the predictor and outcome variables are proportions

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
2
down vote

favorite

I am using

$X$ The estimated pre-game win probability of a sporting team playing on its Home field (estimated according to a certain model)

to predict

$Y$ Actual proportion of points scored by the Home team in the game (i.e. number of points scored by Home team divided by all points scored in the game).

Graphed, the data look like this.

Home team estimated probability of winning vs Home team actual proportion of score

Data viewable here.

I did a simple linear regression and it produced the parameters $b_0 = 0.3554$ and $b_1 = 0.2930$. Thus even at the maximum possible value of $x$ it doesn't produce a prediction that the team will score more than 100% of the points.

However, some reading of other questions here indicates that linear regression is generally considered inappropriate for situations in which the outcome variable is a proportion.

The question is highly similar to this one, in which a the poster was seeking to predict a team's winning percentage. There it was suggested that the poster should convert the proportion of wins to the number of wins. However, in my question it would not be the same thing for me to use the number of points scored by the team.

How inappropriate it is for me to use linear regression here?

What analysis should I use, keeping in mind that unlike in the linked question I can't just use the raw number of points scored by the team (since I really am interested in the proportion of points they will score). gung's answer here seems to indicate beta regression if the predictor is a continuous proportion, and logistic regression if it's a count proportion. However, I'm not sure which of those two my predictor is.

Does it make any difference that my predictor is also proportion?

edited 3 hours ago

asked 4 hours ago

user1205901

3,289144595

2

1). It's not great because your DV is bounded, 2). Beta regression would be the first choice, since it allows you to model your data as is (without needing to change it), 3). Yes, you could also use a logit transform $$textlogBig(cfracp1-pBig)$$ to transform your DV and do regular linear regression.
â€“Â user2974951
4 hours ago

add a commentÂ |Â

up vote
2
down vote

favorite

I am using

$X$ The estimated pre-game win probability of a sporting team playing on its Home field (estimated according to a certain model)

to predict

$Y$ Actual proportion of points scored by the Home team in the game (i.e. number of points scored by Home team divided by all points scored in the game).

Graphed, the data look like this.

Home team estimated probability of winning vs Home team actual proportion of score

Data viewable here.

However, some reading of other questions here indicates that linear regression is generally considered inappropriate for situations in which the outcome variable is a proportion.

How inappropriate it is for me to use linear regression here?

What analysis should I use, keeping in mind that unlike in the linked question I can't just use the raw number of points scored by the team (since I really am interested in the proportion of points they will score). gung's answer here seems to indicate beta regression if the predictor is a continuous proportion, and logistic regression if it's a count proportion. However, I'm not sure which of those two my predictor is.

Does it make any difference that my predictor is also proportion?

edited 3 hours ago

asked 4 hours ago

user1205901

3,289144595

2

1). It's not great because your DV is bounded, 2). Beta regression would be the first choice, since it allows you to model your data as is (without needing to change it), 3). Yes, you could also use a logit transform $$textlogBig(cfracp1-pBig)$$ to transform your DV and do regular linear regression.
â€“Â user2974951
4 hours ago

add a commentÂ |Â

up vote
2
down vote

favorite

I am using

$X$ The estimated pre-game win probability of a sporting team playing on its Home field (estimated according to a certain model)

to predict

$Y$ Actual proportion of points scored by the Home team in the game (i.e. number of points scored by Home team divided by all points scored in the game).

Graphed, the data look like this.

Home team estimated probability of winning vs Home team actual proportion of score

Data viewable here.

However, some reading of other questions here indicates that linear regression is generally considered inappropriate for situations in which the outcome variable is a proportion.

How inappropriate it is for me to use linear regression here?

What analysis should I use, keeping in mind that unlike in the linked question I can't just use the raw number of points scored by the team (since I really am interested in the proportion of points they will score). gung's answer here seems to indicate beta regression if the predictor is a continuous proportion, and logistic regression if it's a count proportion. However, I'm not sure which of those two my predictor is.

Does it make any difference that my predictor is also proportion?

edited 3 hours ago

asked 4 hours ago

user1205901

3,289144595

I am using

$X$ The estimated pre-game win probability of a sporting team playing on its Home field (estimated according to a certain model)

to predict

$Y$ Actual proportion of points scored by the Home team in the game (i.e. number of points scored by Home team divided by all points scored in the game).

Graphed, the data look like this.

Home team estimated probability of winning vs Home team actual proportion of score

Data viewable here.

However, some reading of other questions here indicates that linear regression is generally considered inappropriate for situations in which the outcome variable is a proportion.

How inappropriate it is for me to use linear regression here?

What analysis should I use, keeping in mind that unlike in the linked question I can't just use the raw number of points scored by the team (since I really am interested in the proportion of points they will score). gung's answer here seems to indicate beta regression if the predictor is a continuous proportion, and logistic regression if it's a count proportion. However, I'm not sure which of those two my predictor is.

Does it make any difference that my predictor is also proportion?

regression

edited 3 hours ago

asked 4 hours ago

user1205901

3,289144595

edited 3 hours ago

asked 4 hours ago

user1205901

3,289144595

edited 3 hours ago

asked 4 hours ago

user1205901

3,289144595

asked 4 hours ago

user1205901

3,289144595

asked 4 hours ago

user1205901

3,289144595

2

1). It's not great because your DV is bounded, 2). Beta regression would be the first choice, since it allows you to model your data as is (without needing to change it), 3). Yes, you could also use a logit transform $$textlogBig(cfracp1-pBig)$$ to transform your DV and do regular linear regression.
â€“Â user2974951
4 hours ago

add a commentÂ |Â

2

1). It's not great because your DV is bounded, 2). Beta regression would be the first choice, since it allows you to model your data as is (without needing to change it), 3). Yes, you could also use a logit transform $$textlogBig(cfracp1-pBig)$$ to transform your DV and do regular linear regression.
â€“Â user2974951
4 hours ago

1). It's not great because your DV is bounded, 2). Beta regression would be the first choice, since it allows you to model your data as is (without needing to change it), 3). Yes, you could also use a logit transform $$textlogBig(cfracp1-pBig)$$ to transform your DV and do regular linear regression.
â€“Â user2974951
4 hours ago

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
2
down vote

A glm with a binomial distribution and a logit link should work fine. If there are no probabilities of $0$ or $1$, beta regression is another possibility. In this case, they yield almost indistinguishable results (not shown).

The relationship between the logit of the actual and the logit of the estimated probabilities looks quite linear (see picture below), which would lead me to use the logit of the estimated probabilities as predictor. The R code using to generate the plots is at the bottom of this answer.

loess_plot

Let's visualize the model on the original scale (the points are partial residuals):

enter image description here

R code:

library(betareg)
library(visreg)

# Convert probabilities to log-odds
est_p_logit <- log(est_prob/(1 - est_prob))
act_p_logit <- log(act_prob/(1 - act_prob))

# Check linearity
scatter.smooth(act_p_logit~est_p_logit, las = 1)

# GLM with binomial distribution and logit-link
mod <- glm(act_prob~est_p_logit, family = binomial)

summary(mod)

# Visualize GLM model
res_glm <- visreg(mod, scale = "response", ylim = c(0, 1), partial = TRUE, rug = 2)

# Beta regression
mod_beta <- betareg(act_prob~est_p_logit)

# Compare GLM results with beta regression
res_beta <- visreg(mod_beta, scale = "response", ylim = c(0, 1))
lines(res_beta$fit$visregFit~res_beta$fit$est_p_logit, type = "l", col = "red")

edited 1 hour ago

answered 1 hour ago

COOLSerdash

15.6k75090

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f373835%2fregression-when-both-the-predictor-and-outcome-variables-are-proportions%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

loess_plot

Let's visualize the model on the original scale (the points are partial residuals):

enter image description here

R code:

library(betareg)
library(visreg)

# Convert probabilities to log-odds
est_p_logit <- log(est_prob/(1 - est_prob))
act_p_logit <- log(act_prob/(1 - act_prob))

# Check linearity
scatter.smooth(act_p_logit~est_p_logit, las = 1)

# GLM with binomial distribution and logit-link
mod <- glm(act_prob~est_p_logit, family = binomial)

summary(mod)

# Visualize GLM model
res_glm <- visreg(mod, scale = "response", ylim = c(0, 1), partial = TRUE, rug = 2)

# Beta regression
mod_beta <- betareg(act_prob~est_p_logit)

# Compare GLM results with beta regression
res_beta <- visreg(mod_beta, scale = "response", ylim = c(0, 1))
lines(res_beta$fit$visregFit~res_beta$fit$est_p_logit, type = "l", col = "red")

edited 1 hour ago

answered 1 hour ago

COOLSerdash

15.6k75090

add a commentÂ |Â

up vote
2
down vote

loess_plot

Let's visualize the model on the original scale (the points are partial residuals):

enter image description here

R code:

library(betareg)
library(visreg)

# Convert probabilities to log-odds
est_p_logit <- log(est_prob/(1 - est_prob))
act_p_logit <- log(act_prob/(1 - act_prob))

# Check linearity
scatter.smooth(act_p_logit~est_p_logit, las = 1)

# GLM with binomial distribution and logit-link
mod <- glm(act_prob~est_p_logit, family = binomial)

summary(mod)

# Visualize GLM model
res_glm <- visreg(mod, scale = "response", ylim = c(0, 1), partial = TRUE, rug = 2)

# Beta regression
mod_beta <- betareg(act_prob~est_p_logit)

# Compare GLM results with beta regression
res_beta <- visreg(mod_beta, scale = "response", ylim = c(0, 1))
lines(res_beta$fit$visregFit~res_beta$fit$est_p_logit, type = "l", col = "red")

edited 1 hour ago

answered 1 hour ago

COOLSerdash

15.6k75090

add a commentÂ |Â

up vote
2
down vote

loess_plot

Let's visualize the model on the original scale (the points are partial residuals):

enter image description here

R code:

library(betareg)
library(visreg)

# Convert probabilities to log-odds
est_p_logit <- log(est_prob/(1 - est_prob))
act_p_logit <- log(act_prob/(1 - act_prob))

# Check linearity
scatter.smooth(act_p_logit~est_p_logit, las = 1)

# GLM with binomial distribution and logit-link
mod <- glm(act_prob~est_p_logit, family = binomial)

summary(mod)

# Visualize GLM model
res_glm <- visreg(mod, scale = "response", ylim = c(0, 1), partial = TRUE, rug = 2)

# Beta regression
mod_beta <- betareg(act_prob~est_p_logit)

# Compare GLM results with beta regression
res_beta <- visreg(mod_beta, scale = "response", ylim = c(0, 1))
lines(res_beta$fit$visregFit~res_beta$fit$est_p_logit, type = "l", col = "red")

edited 1 hour ago

answered 1 hour ago

COOLSerdash

15.6k75090

loess_plot

Let's visualize the model on the original scale (the points are partial residuals):

enter image description here

R code:

library(betareg)
library(visreg)

# Convert probabilities to log-odds
est_p_logit <- log(est_prob/(1 - est_prob))
act_p_logit <- log(act_prob/(1 - act_prob))

# Check linearity
scatter.smooth(act_p_logit~est_p_logit, las = 1)

# GLM with binomial distribution and logit-link
mod <- glm(act_prob~est_p_logit, family = binomial)

summary(mod)

# Visualize GLM model
res_glm <- visreg(mod, scale = "response", ylim = c(0, 1), partial = TRUE, rug = 2)

# Beta regression
mod_beta <- betareg(act_prob~est_p_logit)

# Compare GLM results with beta regression
res_beta <- visreg(mod_beta, scale = "response", ylim = c(0, 1))
lines(res_beta$fit$visregFit~res_beta$fit$est_p_logit, type = "l", col = "red")

edited 1 hour ago

answered 1 hour ago

COOLSerdash

15.6k75090

edited 1 hour ago

answered 1 hour ago

COOLSerdash

15.6k75090

answered 1 hour ago

COOLSerdash

15.6k75090

answered 1 hour ago

COOLSerdash

15.6k75090

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky