Family of GLM represents the distribution of the response variable or residuals?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
2
down vote

favorite

I have been discussing with several lab members about this one, and we have gone to several sources but still don't quite have the answer:

When we say a GLM has a family of poisson let's say are we talking about the distribution of the residuals or the response variable?

Points of contention

Reading this article it states that the assumptions of the GLM are The statistical independence of observations, the correct specification of the link and variance function (which makes me think about the residuals, not the response variable), the correct scale of measurement for response variable and lack of undue influence of single points

This question has two answers with two points each, the one that appears first talks about the residuals, and the second one about the response variable, which is it?

In this blogpost, when talking about assumptions, they state "The distribution of the residuals can be other, eg, binomial"

At the beginning of this chapter they say that the structure of the errors has to be Poisson, but the residuals will surely have positive and negative values, how can that be Poisson?

This question, which often is cited in questions such as this one to make them duplicated does not have an accepted answer

This question the answers talk about response and not residuals

In this course description from the University of Pensilvania they talk about the response variable in the assumptions, not the residuals

asked 3 hours ago

Derek Corcoran

1285

add a commentÂ |Â

up vote
2
down vote

favorite

I have been discussing with several lab members about this one, and we have gone to several sources but still don't quite have the answer:

When we say a GLM has a family of poisson let's say are we talking about the distribution of the residuals or the response variable?

Points of contention

Reading this article it states that the assumptions of the GLM are The statistical independence of observations, the correct specification of the link and variance function (which makes me think about the residuals, not the response variable), the correct scale of measurement for response variable and lack of undue influence of single points

This question has two answers with two points each, the one that appears first talks about the residuals, and the second one about the response variable, which is it?

In this blogpost, when talking about assumptions, they state "The distribution of the residuals can be other, eg, binomial"

At the beginning of this chapter they say that the structure of the errors has to be Poisson, but the residuals will surely have positive and negative values, how can that be Poisson?

This question, which often is cited in questions such as this one to make them duplicated does not have an accepted answer

This question the answers talk about response and not residuals

In this course description from the University of Pensilvania they talk about the response variable in the assumptions, not the residuals

asked 3 hours ago

Derek Corcoran

1285

add a commentÂ |Â

up vote
2
down vote

favorite

I have been discussing with several lab members about this one, and we have gone to several sources but still don't quite have the answer:

When we say a GLM has a family of poisson let's say are we talking about the distribution of the residuals or the response variable?

Points of contention

Reading this article it states that the assumptions of the GLM are The statistical independence of observations, the correct specification of the link and variance function (which makes me think about the residuals, not the response variable), the correct scale of measurement for response variable and lack of undue influence of single points

This question has two answers with two points each, the one that appears first talks about the residuals, and the second one about the response variable, which is it?

In this blogpost, when talking about assumptions, they state "The distribution of the residuals can be other, eg, binomial"

At the beginning of this chapter they say that the structure of the errors has to be Poisson, but the residuals will surely have positive and negative values, how can that be Poisson?

This question, which often is cited in questions such as this one to make them duplicated does not have an accepted answer

This question the answers talk about response and not residuals

In this course description from the University of Pensilvania they talk about the response variable in the assumptions, not the residuals

asked 3 hours ago

Derek Corcoran

1285

I have been discussing with several lab members about this one, and we have gone to several sources but still don't quite have the answer:

When we say a GLM has a family of poisson let's say are we talking about the distribution of the residuals or the response variable?

Points of contention

Reading this article it states that the assumptions of the GLM are The statistical independence of observations, the correct specification of the link and variance function (which makes me think about the residuals, not the response variable), the correct scale of measurement for response variable and lack of undue influence of single points

This question has two answers with two points each, the one that appears first talks about the residuals, and the second one about the response variable, which is it?

In this blogpost, when talking about assumptions, they state "The distribution of the residuals can be other, eg, binomial"

At the beginning of this chapter they say that the structure of the errors has to be Poisson, but the residuals will surely have positive and negative values, how can that be Poisson?

This question, which often is cited in questions such as this one to make them duplicated does not have an accepted answer

This question the answers talk about response and not residuals

In this course description from the University of Pensilvania they talk about the response variable in the assumptions, not the residuals

generalized-linear-model residuals assumptions

asked 3 hours ago

Derek Corcoran

1285

asked 3 hours ago

Derek Corcoran

1285

asked 3 hours ago

Derek Corcoran

1285

asked 3 hours ago

Derek Corcoran

1285

asked 3 hours ago

Derek Corcoran

1285

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
3
down vote

The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).

Look at this way: For the usual linear regression, we can write the model as
$$Y_i sim textNormal(beta_0+x_i^Tbeta, sigma^2).
$$
This means that the response $Y_i$ has a normal distribution (with constant variance), but the expectation is different for each $i$. Therefore the conditional distribution of the response is a normal distribution (but a different one for each $i$). Another way of writing this model is
$$
Y_i = beta_0+x_i^Tbeta + epsilon_i
$$ where each $epsilon_i$ is distributed $textNormal(0, sigma^2)$.

So for the normal distribution family both descriptions are correct (when interpreted correctly). This is because for the normal linear model we have a clean separation in the model of the systematic part (the $beta_0+x_i^Tbeta$) and the disturbance part (the $epsilon_i$) which are simply added. But for other family functions, this separation is not possible! There is not even a clean definition of what residual means (and for that reason, many different definitions of "residual").

So for all those other families, we use a definition in the style of the first displayed equation above. That is, the conditional distribution of the response. So, no, the residuals (whatever defined) in Poisson regression do not have a Poisson distribution.

answered 3 hours ago

kjetil b halvorsen

26.9k978195

add a commentÂ |Â

up vote
1
down vote

Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.

Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:

Fish weight (Weight);

Whether or not the fish are longer than 30cm;

Number of fish scales.

The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.

Simple Linear Regression

How does Age affect Weight? You are going to formulate a simple linear regression model of the form:

$Weight = beta_0 + beta_1*Age + epsilon$

where the $epsilon$'s are independent, identically distributed, following a Normal distribution with mean 0 and standard deviation $sigma$. In this model, the mean of the Weight variable for all fishes in the lake sharing the same age is assumed to vary linearly with age. The conditional mean is represented by $beta_0$ + $beta_1$Age. It is called conditional because it is the mean weight for fishes with the same Age. (The unconditional mean weight would be the mean weight of all fishes in the lake, regardless of their weight.)

Simple Binary Logistic Regression

How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:

log(p/(1-p)) = $beta_0$ + $beta_1$Age

where p denotes the conditional probability that a fish of a given age is longer than 30cm. In this model, the conditional mean of the variable "whether or not the fish are longer than 30cm" corresponding to all fishes in the lake sharing the same age is assumed to vary linearly with age after being fed to the logit transformation. The logit-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "whether or not the fish are longer than 30cm" for a given age is a Bernoulli distribution. Recall that for this distribution, the variance is a function of the mean value, so if we can estimate its mean value, we can also estimate its variance. (The mean of a Bernoulli variable is p and the variance is p*(1-p).)

Simple Poisson Regression

How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:

log(mu) = $beta_0$ + $beta_1$Age

where mu denotes the conditional mean value of the outcome variable "number of fish scales" for fish of a given age (that is, the expected number of fish scales for fish of a given age). In this model, the conditional mean of the outcome variable is assumed to vary linearly with age after being fed to the log transformation. The log-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "number of fish scales" for a given age is a Poisson distribution. Recall that for this distribution, the mean and variance are equal so it is sufficient to model its mean value.

To sum up, a conditional distribution represents the distribution of the outcome values for specific values of the predictor variable(s) included in the model. Each type of regression model illustrated above imposes certain distributional assumptions on the conditional distribution of the outcome variable given Age. Based on these distributional assumptions, the model proceeds to formulate how (1) the mean of the conditional distribution varies as a function of age (simple linear regression), (2) the logit-transformed mean of the conditional distribution varies as a function of age (simple binary logistic regression) or (3) the log-transformed mean of the conditional distribution varies as a function of age.

For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.

answered 1 min ago

Isabella Ghement

4,857316

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f374452%2ffamily-of-glm-represents-the-distribution-of-the-response-variable-or-residuals%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
3
down vote

The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).

answered 3 hours ago

kjetil b halvorsen

26.9k978195

add a commentÂ |Â

up vote
3
down vote

The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).

answered 3 hours ago

kjetil b halvorsen

26.9k978195

add a commentÂ |Â

up vote
3
down vote

The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).

answered 3 hours ago

kjetil b halvorsen

26.9k978195

The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).

answered 3 hours ago

kjetil b halvorsen

26.9k978195

answered 3 hours ago

kjetil b halvorsen

26.9k978195

answered 3 hours ago

kjetil b halvorsen

26.9k978195

answered 3 hours ago

kjetil b halvorsen

26.9k978195

add a commentÂ |Â

up vote
1
down vote

Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.

Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:

Fish weight (Weight);

Whether or not the fish are longer than 30cm;

Number of fish scales.

The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.

Simple Linear Regression

How does Age affect Weight? You are going to formulate a simple linear regression model of the form:

$Weight = beta_0 + beta_1*Age + epsilon$

Simple Binary Logistic Regression

How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:

log(p/(1-p)) = $beta_0$ + $beta_1$Age

Simple Poisson Regression

How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:

log(mu) = $beta_0$ + $beta_1$Age

For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.

answered 1 min ago

Isabella Ghement

4,857316

add a commentÂ |Â

up vote
1
down vote

Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.

Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:

Fish weight (Weight);

Whether or not the fish are longer than 30cm;

Number of fish scales.

The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.

Simple Linear Regression

How does Age affect Weight? You are going to formulate a simple linear regression model of the form:

$Weight = beta_0 + beta_1*Age + epsilon$

Simple Binary Logistic Regression

How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:

log(p/(1-p)) = $beta_0$ + $beta_1$Age

Simple Poisson Regression

How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:

log(mu) = $beta_0$ + $beta_1$Age

For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.

answered 1 min ago

Isabella Ghement

4,857316

add a commentÂ |Â

up vote
1
down vote

Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.

Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:

Fish weight (Weight);

Whether or not the fish are longer than 30cm;

Number of fish scales.

The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.

Simple Linear Regression

How does Age affect Weight? You are going to formulate a simple linear regression model of the form:

$Weight = beta_0 + beta_1*Age + epsilon$

Simple Binary Logistic Regression

How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:

log(p/(1-p)) = $beta_0$ + $beta_1$Age

Simple Poisson Regression

How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:

log(mu) = $beta_0$ + $beta_1$Age

For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.

answered 1 min ago

Isabella Ghement

4,857316

Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.

Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:

Fish weight (Weight);

Whether or not the fish are longer than 30cm;

Number of fish scales.

The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.

Simple Linear Regression

How does Age affect Weight? You are going to formulate a simple linear regression model of the form:

$Weight = beta_0 + beta_1*Age + epsilon$

Simple Binary Logistic Regression

How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:

log(p/(1-p)) = $beta_0$ + $beta_1$Age

Simple Poisson Regression

How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:

log(mu) = $beta_0$ + $beta_1$Age

For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.

answered 1 min ago

Isabella Ghement

4,857316

answered 1 min ago

Isabella Ghement

4,857316

answered 1 min ago

Isabella Ghement

4,857316

answered 1 min ago

Isabella Ghement

4,857316

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky