Family of GLM represents the distribution of the response variable or residuals?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
I have been discussing with several lab members about this one, and we have gone to several sources but still don't quite have the answer:
When we say a GLM has a family of poisson let's say are we talking about the distribution of the residuals or the response variable?
Points of contention
Reading this article it states that the assumptions of the GLM are The statistical independence of observations, the correct specification of the link and variance function (which makes me think about the residuals, not the response variable), the correct scale of measurement for response variable and lack of undue influence of single points
This question has two answers with two points each, the one that appears first talks about the residuals, and the second one about the response variable, which is it?
In this blogpost, when talking about assumptions, they state "The distribution of the residuals can be other, eg, binomial"
At the beginning of this chapter they say that the structure of the errors has to be Poisson, but the residuals will surely have positive and negative values, how can that be Poisson?
This question, which often is cited in questions such as this one to make them duplicated does not have an accepted answer
This question the answers talk about response and not residuals
In this course description from the University of Pensilvania they talk about the response variable in the assumptions, not the residuals
generalized-linear-model residuals assumptions
add a comment |Â
up vote
2
down vote
favorite
I have been discussing with several lab members about this one, and we have gone to several sources but still don't quite have the answer:
When we say a GLM has a family of poisson let's say are we talking about the distribution of the residuals or the response variable?
Points of contention
Reading this article it states that the assumptions of the GLM are The statistical independence of observations, the correct specification of the link and variance function (which makes me think about the residuals, not the response variable), the correct scale of measurement for response variable and lack of undue influence of single points
This question has two answers with two points each, the one that appears first talks about the residuals, and the second one about the response variable, which is it?
In this blogpost, when talking about assumptions, they state "The distribution of the residuals can be other, eg, binomial"
At the beginning of this chapter they say that the structure of the errors has to be Poisson, but the residuals will surely have positive and negative values, how can that be Poisson?
This question, which often is cited in questions such as this one to make them duplicated does not have an accepted answer
This question the answers talk about response and not residuals
In this course description from the University of Pensilvania they talk about the response variable in the assumptions, not the residuals
generalized-linear-model residuals assumptions
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have been discussing with several lab members about this one, and we have gone to several sources but still don't quite have the answer:
When we say a GLM has a family of poisson let's say are we talking about the distribution of the residuals or the response variable?
Points of contention
Reading this article it states that the assumptions of the GLM are The statistical independence of observations, the correct specification of the link and variance function (which makes me think about the residuals, not the response variable), the correct scale of measurement for response variable and lack of undue influence of single points
This question has two answers with two points each, the one that appears first talks about the residuals, and the second one about the response variable, which is it?
In this blogpost, when talking about assumptions, they state "The distribution of the residuals can be other, eg, binomial"
At the beginning of this chapter they say that the structure of the errors has to be Poisson, but the residuals will surely have positive and negative values, how can that be Poisson?
This question, which often is cited in questions such as this one to make them duplicated does not have an accepted answer
This question the answers talk about response and not residuals
In this course description from the University of Pensilvania they talk about the response variable in the assumptions, not the residuals
generalized-linear-model residuals assumptions
I have been discussing with several lab members about this one, and we have gone to several sources but still don't quite have the answer:
When we say a GLM has a family of poisson let's say are we talking about the distribution of the residuals or the response variable?
Points of contention
Reading this article it states that the assumptions of the GLM are The statistical independence of observations, the correct specification of the link and variance function (which makes me think about the residuals, not the response variable), the correct scale of measurement for response variable and lack of undue influence of single points
This question has two answers with two points each, the one that appears first talks about the residuals, and the second one about the response variable, which is it?
In this blogpost, when talking about assumptions, they state "The distribution of the residuals can be other, eg, binomial"
At the beginning of this chapter they say that the structure of the errors has to be Poisson, but the residuals will surely have positive and negative values, how can that be Poisson?
This question, which often is cited in questions such as this one to make them duplicated does not have an accepted answer
This question the answers talk about response and not residuals
In this course description from the University of Pensilvania they talk about the response variable in the assumptions, not the residuals
generalized-linear-model residuals assumptions
generalized-linear-model residuals assumptions
asked 3 hours ago
Derek Corcoran
1285
1285
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
3
down vote
The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).
Look at this way: For the usual linear regression, we can write the model as
$$Y_i sim textNormal(beta_0+x_i^Tbeta, sigma^2).
$$
This means that the response $Y_i$ has a normal distribution (with constant variance), but the expectation is different for each $i$. Therefore the conditional distribution of the response is a normal distribution (but a different one for each $i$). Another way of writing this model is
$$
Y_i = beta_0+x_i^Tbeta + epsilon_i
$$ where each $epsilon_i$ is distributed $textNormal(0, sigma^2)$.
So for the normal distribution family both descriptions are correct (when interpreted correctly). This is because for the normal linear model we have a clean separation in the model of the systematic part (the $beta_0+x_i^Tbeta$) and the disturbance part (the $epsilon_i$) which are simply added. But for other family functions, this separation is not possible! There is not even a clean definition of what residual means (and for that reason, many different definitions of "residual").
So for all those other families, we use a definition in the style of the first displayed equation above. That is, the conditional distribution of the response. So, no, the residuals (whatever defined) in Poisson regression do not have a Poisson distribution.
add a comment |Â
up vote
1
down vote
Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.
Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:
- Fish weight (Weight);
- Whether or not the fish are longer than 30cm;
- Number of fish scales.
The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.
Simple Linear Regression
How does Age affect Weight? You are going to formulate a simple linear regression model of the form:
$Weight = beta_0 + beta_1*Age + epsilon$
where the $epsilon$'s are independent, identically distributed, following a Normal distribution with mean 0 and standard deviation $sigma$. In this model, the mean of the Weight variable for all fishes in the lake sharing the same age is assumed to vary linearly with age. The conditional mean is represented by $beta_0$ + $beta_1$Age. It is called conditional because it is the mean weight for fishes with the same Age. (The unconditional mean weight would be the mean weight of all fishes in the lake, regardless of their weight.)
Simple Binary Logistic Regression
How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:
log(p/(1-p)) = $beta_0$ + $beta_1$Age
where p denotes the conditional probability that a fish of a given age is longer than 30cm. In this model, the conditional mean of the variable "whether or not the fish are longer than 30cm" corresponding to all fishes in the lake sharing the same age is assumed to vary linearly with age after being fed to the logit transformation. The logit-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "whether or not the fish are longer than 30cm" for a given age is a Bernoulli distribution. Recall that for this distribution, the variance is a function of the mean value, so if we can estimate its mean value, we can also estimate its variance. (The mean of a Bernoulli variable is p and the variance is p*(1-p).)
Simple Poisson Regression
How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:
log(mu) = $beta_0$ + $beta_1$Age
where mu denotes the conditional mean value of the outcome variable "number of fish scales" for fish of a given age (that is, the expected number of fish scales for fish of a given age). In this model, the conditional mean of the outcome variable is assumed to vary linearly with age after being fed to the log transformation. The log-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "number of fish scales" for a given age is a Poisson distribution. Recall that for this distribution, the mean and variance are equal so it is sufficient to model its mean value.
To sum up, a conditional distribution represents the distribution of the outcome values for specific values of the predictor variable(s) included in the model. Each type of regression model illustrated above imposes certain distributional assumptions on the conditional distribution of the outcome variable given Age. Based on these distributional assumptions, the model proceeds to formulate how (1) the mean of the conditional distribution varies as a function of age (simple linear regression), (2) the logit-transformed mean of the conditional distribution varies as a function of age (simple binary logistic regression) or (3) the log-transformed mean of the conditional distribution varies as a function of age.
For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).
Look at this way: For the usual linear regression, we can write the model as
$$Y_i sim textNormal(beta_0+x_i^Tbeta, sigma^2).
$$
This means that the response $Y_i$ has a normal distribution (with constant variance), but the expectation is different for each $i$. Therefore the conditional distribution of the response is a normal distribution (but a different one for each $i$). Another way of writing this model is
$$
Y_i = beta_0+x_i^Tbeta + epsilon_i
$$ where each $epsilon_i$ is distributed $textNormal(0, sigma^2)$.
So for the normal distribution family both descriptions are correct (when interpreted correctly). This is because for the normal linear model we have a clean separation in the model of the systematic part (the $beta_0+x_i^Tbeta$) and the disturbance part (the $epsilon_i$) which are simply added. But for other family functions, this separation is not possible! There is not even a clean definition of what residual means (and for that reason, many different definitions of "residual").
So for all those other families, we use a definition in the style of the first displayed equation above. That is, the conditional distribution of the response. So, no, the residuals (whatever defined) in Poisson regression do not have a Poisson distribution.
add a comment |Â
up vote
3
down vote
The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).
Look at this way: For the usual linear regression, we can write the model as
$$Y_i sim textNormal(beta_0+x_i^Tbeta, sigma^2).
$$
This means that the response $Y_i$ has a normal distribution (with constant variance), but the expectation is different for each $i$. Therefore the conditional distribution of the response is a normal distribution (but a different one for each $i$). Another way of writing this model is
$$
Y_i = beta_0+x_i^Tbeta + epsilon_i
$$ where each $epsilon_i$ is distributed $textNormal(0, sigma^2)$.
So for the normal distribution family both descriptions are correct (when interpreted correctly). This is because for the normal linear model we have a clean separation in the model of the systematic part (the $beta_0+x_i^Tbeta$) and the disturbance part (the $epsilon_i$) which are simply added. But for other family functions, this separation is not possible! There is not even a clean definition of what residual means (and for that reason, many different definitions of "residual").
So for all those other families, we use a definition in the style of the first displayed equation above. That is, the conditional distribution of the response. So, no, the residuals (whatever defined) in Poisson regression do not have a Poisson distribution.
add a comment |Â
up vote
3
down vote
up vote
3
down vote
The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).
Look at this way: For the usual linear regression, we can write the model as
$$Y_i sim textNormal(beta_0+x_i^Tbeta, sigma^2).
$$
This means that the response $Y_i$ has a normal distribution (with constant variance), but the expectation is different for each $i$. Therefore the conditional distribution of the response is a normal distribution (but a different one for each $i$). Another way of writing this model is
$$
Y_i = beta_0+x_i^Tbeta + epsilon_i
$$ where each $epsilon_i$ is distributed $textNormal(0, sigma^2)$.
So for the normal distribution family both descriptions are correct (when interpreted correctly). This is because for the normal linear model we have a clean separation in the model of the systematic part (the $beta_0+x_i^Tbeta$) and the disturbance part (the $epsilon_i$) which are simply added. But for other family functions, this separation is not possible! There is not even a clean definition of what residual means (and for that reason, many different definitions of "residual").
So for all those other families, we use a definition in the style of the first displayed equation above. That is, the conditional distribution of the response. So, no, the residuals (whatever defined) in Poisson regression do not have a Poisson distribution.
The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).
Look at this way: For the usual linear regression, we can write the model as
$$Y_i sim textNormal(beta_0+x_i^Tbeta, sigma^2).
$$
This means that the response $Y_i$ has a normal distribution (with constant variance), but the expectation is different for each $i$. Therefore the conditional distribution of the response is a normal distribution (but a different one for each $i$). Another way of writing this model is
$$
Y_i = beta_0+x_i^Tbeta + epsilon_i
$$ where each $epsilon_i$ is distributed $textNormal(0, sigma^2)$.
So for the normal distribution family both descriptions are correct (when interpreted correctly). This is because for the normal linear model we have a clean separation in the model of the systematic part (the $beta_0+x_i^Tbeta$) and the disturbance part (the $epsilon_i$) which are simply added. But for other family functions, this separation is not possible! There is not even a clean definition of what residual means (and for that reason, many different definitions of "residual").
So for all those other families, we use a definition in the style of the first displayed equation above. That is, the conditional distribution of the response. So, no, the residuals (whatever defined) in Poisson regression do not have a Poisson distribution.
answered 3 hours ago
kjetil b halvorsen
26.9k978195
26.9k978195
add a comment |Â
add a comment |Â
up vote
1
down vote
Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.
Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:
- Fish weight (Weight);
- Whether or not the fish are longer than 30cm;
- Number of fish scales.
The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.
Simple Linear Regression
How does Age affect Weight? You are going to formulate a simple linear regression model of the form:
$Weight = beta_0 + beta_1*Age + epsilon$
where the $epsilon$'s are independent, identically distributed, following a Normal distribution with mean 0 and standard deviation $sigma$. In this model, the mean of the Weight variable for all fishes in the lake sharing the same age is assumed to vary linearly with age. The conditional mean is represented by $beta_0$ + $beta_1$Age. It is called conditional because it is the mean weight for fishes with the same Age. (The unconditional mean weight would be the mean weight of all fishes in the lake, regardless of their weight.)
Simple Binary Logistic Regression
How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:
log(p/(1-p)) = $beta_0$ + $beta_1$Age
where p denotes the conditional probability that a fish of a given age is longer than 30cm. In this model, the conditional mean of the variable "whether or not the fish are longer than 30cm" corresponding to all fishes in the lake sharing the same age is assumed to vary linearly with age after being fed to the logit transformation. The logit-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "whether or not the fish are longer than 30cm" for a given age is a Bernoulli distribution. Recall that for this distribution, the variance is a function of the mean value, so if we can estimate its mean value, we can also estimate its variance. (The mean of a Bernoulli variable is p and the variance is p*(1-p).)
Simple Poisson Regression
How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:
log(mu) = $beta_0$ + $beta_1$Age
where mu denotes the conditional mean value of the outcome variable "number of fish scales" for fish of a given age (that is, the expected number of fish scales for fish of a given age). In this model, the conditional mean of the outcome variable is assumed to vary linearly with age after being fed to the log transformation. The log-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "number of fish scales" for a given age is a Poisson distribution. Recall that for this distribution, the mean and variance are equal so it is sufficient to model its mean value.
To sum up, a conditional distribution represents the distribution of the outcome values for specific values of the predictor variable(s) included in the model. Each type of regression model illustrated above imposes certain distributional assumptions on the conditional distribution of the outcome variable given Age. Based on these distributional assumptions, the model proceeds to formulate how (1) the mean of the conditional distribution varies as a function of age (simple linear regression), (2) the logit-transformed mean of the conditional distribution varies as a function of age (simple binary logistic regression) or (3) the log-transformed mean of the conditional distribution varies as a function of age.
For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.
add a comment |Â
up vote
1
down vote
Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.
Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:
- Fish weight (Weight);
- Whether or not the fish are longer than 30cm;
- Number of fish scales.
The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.
Simple Linear Regression
How does Age affect Weight? You are going to formulate a simple linear regression model of the form:
$Weight = beta_0 + beta_1*Age + epsilon$
where the $epsilon$'s are independent, identically distributed, following a Normal distribution with mean 0 and standard deviation $sigma$. In this model, the mean of the Weight variable for all fishes in the lake sharing the same age is assumed to vary linearly with age. The conditional mean is represented by $beta_0$ + $beta_1$Age. It is called conditional because it is the mean weight for fishes with the same Age. (The unconditional mean weight would be the mean weight of all fishes in the lake, regardless of their weight.)
Simple Binary Logistic Regression
How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:
log(p/(1-p)) = $beta_0$ + $beta_1$Age
where p denotes the conditional probability that a fish of a given age is longer than 30cm. In this model, the conditional mean of the variable "whether or not the fish are longer than 30cm" corresponding to all fishes in the lake sharing the same age is assumed to vary linearly with age after being fed to the logit transformation. The logit-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "whether or not the fish are longer than 30cm" for a given age is a Bernoulli distribution. Recall that for this distribution, the variance is a function of the mean value, so if we can estimate its mean value, we can also estimate its variance. (The mean of a Bernoulli variable is p and the variance is p*(1-p).)
Simple Poisson Regression
How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:
log(mu) = $beta_0$ + $beta_1$Age
where mu denotes the conditional mean value of the outcome variable "number of fish scales" for fish of a given age (that is, the expected number of fish scales for fish of a given age). In this model, the conditional mean of the outcome variable is assumed to vary linearly with age after being fed to the log transformation. The log-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "number of fish scales" for a given age is a Poisson distribution. Recall that for this distribution, the mean and variance are equal so it is sufficient to model its mean value.
To sum up, a conditional distribution represents the distribution of the outcome values for specific values of the predictor variable(s) included in the model. Each type of regression model illustrated above imposes certain distributional assumptions on the conditional distribution of the outcome variable given Age. Based on these distributional assumptions, the model proceeds to formulate how (1) the mean of the conditional distribution varies as a function of age (simple linear regression), (2) the logit-transformed mean of the conditional distribution varies as a function of age (simple binary logistic regression) or (3) the log-transformed mean of the conditional distribution varies as a function of age.
For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.
Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:
- Fish weight (Weight);
- Whether or not the fish are longer than 30cm;
- Number of fish scales.
The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.
Simple Linear Regression
How does Age affect Weight? You are going to formulate a simple linear regression model of the form:
$Weight = beta_0 + beta_1*Age + epsilon$
where the $epsilon$'s are independent, identically distributed, following a Normal distribution with mean 0 and standard deviation $sigma$. In this model, the mean of the Weight variable for all fishes in the lake sharing the same age is assumed to vary linearly with age. The conditional mean is represented by $beta_0$ + $beta_1$Age. It is called conditional because it is the mean weight for fishes with the same Age. (The unconditional mean weight would be the mean weight of all fishes in the lake, regardless of their weight.)
Simple Binary Logistic Regression
How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:
log(p/(1-p)) = $beta_0$ + $beta_1$Age
where p denotes the conditional probability that a fish of a given age is longer than 30cm. In this model, the conditional mean of the variable "whether or not the fish are longer than 30cm" corresponding to all fishes in the lake sharing the same age is assumed to vary linearly with age after being fed to the logit transformation. The logit-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "whether or not the fish are longer than 30cm" for a given age is a Bernoulli distribution. Recall that for this distribution, the variance is a function of the mean value, so if we can estimate its mean value, we can also estimate its variance. (The mean of a Bernoulli variable is p and the variance is p*(1-p).)
Simple Poisson Regression
How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:
log(mu) = $beta_0$ + $beta_1$Age
where mu denotes the conditional mean value of the outcome variable "number of fish scales" for fish of a given age (that is, the expected number of fish scales for fish of a given age). In this model, the conditional mean of the outcome variable is assumed to vary linearly with age after being fed to the log transformation. The log-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "number of fish scales" for a given age is a Poisson distribution. Recall that for this distribution, the mean and variance are equal so it is sufficient to model its mean value.
To sum up, a conditional distribution represents the distribution of the outcome values for specific values of the predictor variable(s) included in the model. Each type of regression model illustrated above imposes certain distributional assumptions on the conditional distribution of the outcome variable given Age. Based on these distributional assumptions, the model proceeds to formulate how (1) the mean of the conditional distribution varies as a function of age (simple linear regression), (2) the logit-transformed mean of the conditional distribution varies as a function of age (simple binary logistic regression) or (3) the log-transformed mean of the conditional distribution varies as a function of age.
For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.
Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.
Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:
- Fish weight (Weight);
- Whether or not the fish are longer than 30cm;
- Number of fish scales.
The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.
Simple Linear Regression
How does Age affect Weight? You are going to formulate a simple linear regression model of the form:
$Weight = beta_0 + beta_1*Age + epsilon$
where the $epsilon$'s are independent, identically distributed, following a Normal distribution with mean 0 and standard deviation $sigma$. In this model, the mean of the Weight variable for all fishes in the lake sharing the same age is assumed to vary linearly with age. The conditional mean is represented by $beta_0$ + $beta_1$Age. It is called conditional because it is the mean weight for fishes with the same Age. (The unconditional mean weight would be the mean weight of all fishes in the lake, regardless of their weight.)
Simple Binary Logistic Regression
How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:
log(p/(1-p)) = $beta_0$ + $beta_1$Age
where p denotes the conditional probability that a fish of a given age is longer than 30cm. In this model, the conditional mean of the variable "whether or not the fish are longer than 30cm" corresponding to all fishes in the lake sharing the same age is assumed to vary linearly with age after being fed to the logit transformation. The logit-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "whether or not the fish are longer than 30cm" for a given age is a Bernoulli distribution. Recall that for this distribution, the variance is a function of the mean value, so if we can estimate its mean value, we can also estimate its variance. (The mean of a Bernoulli variable is p and the variance is p*(1-p).)
Simple Poisson Regression
How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:
log(mu) = $beta_0$ + $beta_1$Age
where mu denotes the conditional mean value of the outcome variable "number of fish scales" for fish of a given age (that is, the expected number of fish scales for fish of a given age). In this model, the conditional mean of the outcome variable is assumed to vary linearly with age after being fed to the log transformation. The log-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "number of fish scales" for a given age is a Poisson distribution. Recall that for this distribution, the mean and variance are equal so it is sufficient to model its mean value.
To sum up, a conditional distribution represents the distribution of the outcome values for specific values of the predictor variable(s) included in the model. Each type of regression model illustrated above imposes certain distributional assumptions on the conditional distribution of the outcome variable given Age. Based on these distributional assumptions, the model proceeds to formulate how (1) the mean of the conditional distribution varies as a function of age (simple linear regression), (2) the logit-transformed mean of the conditional distribution varies as a function of age (simple binary logistic regression) or (3) the log-transformed mean of the conditional distribution varies as a function of age.
For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.
answered 1 min ago
Isabella Ghement
4,857316
4,857316
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f374452%2ffamily-of-glm-represents-the-distribution-of-the-response-variable-or-residuals%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password