Family of GLM represents the distribution of the response variable or residuals?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite












I have been discussing with several lab members about this one, and we have gone to several sources but still don't quite have the answer:



When we say a GLM has a family of poisson let's say are we talking about the distribution of the residuals or the response variable?



Points of contention



  1. Reading this article it states that the assumptions of the GLM are The statistical independence of observations, the correct specification of the link and variance function (which makes me think about the residuals, not the response variable), the correct scale of measurement for response variable and lack of undue influence of single points


  2. This question has two answers with two points each, the one that appears first talks about the residuals, and the second one about the response variable, which is it?


  3. In this blogpost, when talking about assumptions, they state "The distribution of the residuals can be other, eg, binomial"


  4. At the beginning of this chapter they say that the structure of the errors has to be Poisson, but the residuals will surely have positive and negative values, how can that be Poisson?


  5. This question, which often is cited in questions such as this one to make them duplicated does not have an accepted answer


  6. This question the answers talk about response and not residuals


  7. In this course description from the University of Pensilvania they talk about the response variable in the assumptions, not the residuals










share|cite|improve this question



























    up vote
    2
    down vote

    favorite












    I have been discussing with several lab members about this one, and we have gone to several sources but still don't quite have the answer:



    When we say a GLM has a family of poisson let's say are we talking about the distribution of the residuals or the response variable?



    Points of contention



    1. Reading this article it states that the assumptions of the GLM are The statistical independence of observations, the correct specification of the link and variance function (which makes me think about the residuals, not the response variable), the correct scale of measurement for response variable and lack of undue influence of single points


    2. This question has two answers with two points each, the one that appears first talks about the residuals, and the second one about the response variable, which is it?


    3. In this blogpost, when talking about assumptions, they state "The distribution of the residuals can be other, eg, binomial"


    4. At the beginning of this chapter they say that the structure of the errors has to be Poisson, but the residuals will surely have positive and negative values, how can that be Poisson?


    5. This question, which often is cited in questions such as this one to make them duplicated does not have an accepted answer


    6. This question the answers talk about response and not residuals


    7. In this course description from the University of Pensilvania they talk about the response variable in the assumptions, not the residuals










    share|cite|improve this question























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      I have been discussing with several lab members about this one, and we have gone to several sources but still don't quite have the answer:



      When we say a GLM has a family of poisson let's say are we talking about the distribution of the residuals or the response variable?



      Points of contention



      1. Reading this article it states that the assumptions of the GLM are The statistical independence of observations, the correct specification of the link and variance function (which makes me think about the residuals, not the response variable), the correct scale of measurement for response variable and lack of undue influence of single points


      2. This question has two answers with two points each, the one that appears first talks about the residuals, and the second one about the response variable, which is it?


      3. In this blogpost, when talking about assumptions, they state "The distribution of the residuals can be other, eg, binomial"


      4. At the beginning of this chapter they say that the structure of the errors has to be Poisson, but the residuals will surely have positive and negative values, how can that be Poisson?


      5. This question, which often is cited in questions such as this one to make them duplicated does not have an accepted answer


      6. This question the answers talk about response and not residuals


      7. In this course description from the University of Pensilvania they talk about the response variable in the assumptions, not the residuals










      share|cite|improve this question













      I have been discussing with several lab members about this one, and we have gone to several sources but still don't quite have the answer:



      When we say a GLM has a family of poisson let's say are we talking about the distribution of the residuals or the response variable?



      Points of contention



      1. Reading this article it states that the assumptions of the GLM are The statistical independence of observations, the correct specification of the link and variance function (which makes me think about the residuals, not the response variable), the correct scale of measurement for response variable and lack of undue influence of single points


      2. This question has two answers with two points each, the one that appears first talks about the residuals, and the second one about the response variable, which is it?


      3. In this blogpost, when talking about assumptions, they state "The distribution of the residuals can be other, eg, binomial"


      4. At the beginning of this chapter they say that the structure of the errors has to be Poisson, but the residuals will surely have positive and negative values, how can that be Poisson?


      5. This question, which often is cited in questions such as this one to make them duplicated does not have an accepted answer


      6. This question the answers talk about response and not residuals


      7. In this course description from the University of Pensilvania they talk about the response variable in the assumptions, not the residuals







      generalized-linear-model residuals assumptions






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked 3 hours ago









      Derek Corcoran

      1285




      1285




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          3
          down vote













          The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).



          Look at this way: For the usual linear regression, we can write the model as
          $$Y_i sim textNormal(beta_0+x_i^Tbeta, sigma^2).
          $$

          This means that the response $Y_i$ has a normal distribution (with constant variance), but the expectation is different for each $i$. Therefore the conditional distribution of the response is a normal distribution (but a different one for each $i$). Another way of writing this model is
          $$
          Y_i = beta_0+x_i^Tbeta + epsilon_i
          $$
          where each $epsilon_i$ is distributed $textNormal(0, sigma^2)$.



          So for the normal distribution family both descriptions are correct (when interpreted correctly). This is because for the normal linear model we have a clean separation in the model of the systematic part (the $beta_0+x_i^Tbeta$) and the disturbance part (the $epsilon_i$) which are simply added. But for other family functions, this separation is not possible! There is not even a clean definition of what residual means (and for that reason, many different definitions of "residual").



          So for all those other families, we use a definition in the style of the first displayed equation above. That is, the conditional distribution of the response. So, no, the residuals (whatever defined) in Poisson regression do not have a Poisson distribution.






          share|cite|improve this answer



























            up vote
            1
            down vote













            Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.



            Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:



            1. Fish weight (Weight);

            2. Whether or not the fish are longer than 30cm;

            3. Number of fish scales.

            The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.



            Simple Linear Regression



            How does Age affect Weight? You are going to formulate a simple linear regression model of the form:



            $Weight = beta_0 + beta_1*Age + epsilon$


            where the $epsilon$'s are independent, identically distributed, following a Normal distribution with mean 0 and standard deviation $sigma$. In this model, the mean of the Weight variable for all fishes in the lake sharing the same age is assumed to vary linearly with age. The conditional mean is represented by $beta_0$ + $beta_1$Age. It is called conditional because it is the mean weight for fishes with the same Age. (The unconditional mean weight would be the mean weight of all fishes in the lake, regardless of their weight.)



            Simple Binary Logistic Regression



            How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:



            log(p/(1-p)) = $beta_0$ + $beta_1$Age 


            where p denotes the conditional probability that a fish of a given age is longer than 30cm. In this model, the conditional mean of the variable "whether or not the fish are longer than 30cm" corresponding to all fishes in the lake sharing the same age is assumed to vary linearly with age after being fed to the logit transformation. The logit-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "whether or not the fish are longer than 30cm" for a given age is a Bernoulli distribution. Recall that for this distribution, the variance is a function of the mean value, so if we can estimate its mean value, we can also estimate its variance. (The mean of a Bernoulli variable is p and the variance is p*(1-p).)



            Simple Poisson Regression



            How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:



            log(mu) = $beta_0$ + $beta_1$Age 


            where mu denotes the conditional mean value of the outcome variable "number of fish scales" for fish of a given age (that is, the expected number of fish scales for fish of a given age). In this model, the conditional mean of the outcome variable is assumed to vary linearly with age after being fed to the log transformation. The log-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "number of fish scales" for a given age is a Poisson distribution. Recall that for this distribution, the mean and variance are equal so it is sufficient to model its mean value.



            To sum up, a conditional distribution represents the distribution of the outcome values for specific values of the predictor variable(s) included in the model. Each type of regression model illustrated above imposes certain distributional assumptions on the conditional distribution of the outcome variable given Age. Based on these distributional assumptions, the model proceeds to formulate how (1) the mean of the conditional distribution varies as a function of age (simple linear regression), (2) the logit-transformed mean of the conditional distribution varies as a function of age (simple binary logistic regression) or (3) the log-transformed mean of the conditional distribution varies as a function of age.



            For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.






            share|cite




















              Your Answer




              StackExchange.ifUsing("editor", function ()
              return StackExchange.using("mathjaxEditing", function ()
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              );
              );
              , "mathjax-editing");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "65"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: false,
              noModals: false,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













               

              draft saved


              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f374452%2ffamily-of-glm-represents-the-distribution-of-the-response-variable-or-residuals%23new-answer', 'question_page');

              );

              Post as a guest






























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              3
              down vote













              The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).



              Look at this way: For the usual linear regression, we can write the model as
              $$Y_i sim textNormal(beta_0+x_i^Tbeta, sigma^2).
              $$

              This means that the response $Y_i$ has a normal distribution (with constant variance), but the expectation is different for each $i$. Therefore the conditional distribution of the response is a normal distribution (but a different one for each $i$). Another way of writing this model is
              $$
              Y_i = beta_0+x_i^Tbeta + epsilon_i
              $$
              where each $epsilon_i$ is distributed $textNormal(0, sigma^2)$.



              So for the normal distribution family both descriptions are correct (when interpreted correctly). This is because for the normal linear model we have a clean separation in the model of the systematic part (the $beta_0+x_i^Tbeta$) and the disturbance part (the $epsilon_i$) which are simply added. But for other family functions, this separation is not possible! There is not even a clean definition of what residual means (and for that reason, many different definitions of "residual").



              So for all those other families, we use a definition in the style of the first displayed equation above. That is, the conditional distribution of the response. So, no, the residuals (whatever defined) in Poisson regression do not have a Poisson distribution.






              share|cite|improve this answer
























                up vote
                3
                down vote













                The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).



                Look at this way: For the usual linear regression, we can write the model as
                $$Y_i sim textNormal(beta_0+x_i^Tbeta, sigma^2).
                $$

                This means that the response $Y_i$ has a normal distribution (with constant variance), but the expectation is different for each $i$. Therefore the conditional distribution of the response is a normal distribution (but a different one for each $i$). Another way of writing this model is
                $$
                Y_i = beta_0+x_i^Tbeta + epsilon_i
                $$
                where each $epsilon_i$ is distributed $textNormal(0, sigma^2)$.



                So for the normal distribution family both descriptions are correct (when interpreted correctly). This is because for the normal linear model we have a clean separation in the model of the systematic part (the $beta_0+x_i^Tbeta$) and the disturbance part (the $epsilon_i$) which are simply added. But for other family functions, this separation is not possible! There is not even a clean definition of what residual means (and for that reason, many different definitions of "residual").



                So for all those other families, we use a definition in the style of the first displayed equation above. That is, the conditional distribution of the response. So, no, the residuals (whatever defined) in Poisson regression do not have a Poisson distribution.






                share|cite|improve this answer






















                  up vote
                  3
                  down vote










                  up vote
                  3
                  down vote









                  The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).



                  Look at this way: For the usual linear regression, we can write the model as
                  $$Y_i sim textNormal(beta_0+x_i^Tbeta, sigma^2).
                  $$

                  This means that the response $Y_i$ has a normal distribution (with constant variance), but the expectation is different for each $i$. Therefore the conditional distribution of the response is a normal distribution (but a different one for each $i$). Another way of writing this model is
                  $$
                  Y_i = beta_0+x_i^Tbeta + epsilon_i
                  $$
                  where each $epsilon_i$ is distributed $textNormal(0, sigma^2)$.



                  So for the normal distribution family both descriptions are correct (when interpreted correctly). This is because for the normal linear model we have a clean separation in the model of the systematic part (the $beta_0+x_i^Tbeta$) and the disturbance part (the $epsilon_i$) which are simply added. But for other family functions, this separation is not possible! There is not even a clean definition of what residual means (and for that reason, many different definitions of "residual").



                  So for all those other families, we use a definition in the style of the first displayed equation above. That is, the conditional distribution of the response. So, no, the residuals (whatever defined) in Poisson regression do not have a Poisson distribution.






                  share|cite|improve this answer












                  The family argument for glm models determines the distribution family for the conditional distribution of the response, not of the residuals (except for the quasi-models).



                  Look at this way: For the usual linear regression, we can write the model as
                  $$Y_i sim textNormal(beta_0+x_i^Tbeta, sigma^2).
                  $$

                  This means that the response $Y_i$ has a normal distribution (with constant variance), but the expectation is different for each $i$. Therefore the conditional distribution of the response is a normal distribution (but a different one for each $i$). Another way of writing this model is
                  $$
                  Y_i = beta_0+x_i^Tbeta + epsilon_i
                  $$
                  where each $epsilon_i$ is distributed $textNormal(0, sigma^2)$.



                  So for the normal distribution family both descriptions are correct (when interpreted correctly). This is because for the normal linear model we have a clean separation in the model of the systematic part (the $beta_0+x_i^Tbeta$) and the disturbance part (the $epsilon_i$) which are simply added. But for other family functions, this separation is not possible! There is not even a clean definition of what residual means (and for that reason, many different definitions of "residual").



                  So for all those other families, we use a definition in the style of the first displayed equation above. That is, the conditional distribution of the response. So, no, the residuals (whatever defined) in Poisson regression do not have a Poisson distribution.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered 3 hours ago









                  kjetil b halvorsen

                  26.9k978195




                  26.9k978195






















                      up vote
                      1
                      down vote













                      Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.



                      Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:



                      1. Fish weight (Weight);

                      2. Whether or not the fish are longer than 30cm;

                      3. Number of fish scales.

                      The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.



                      Simple Linear Regression



                      How does Age affect Weight? You are going to formulate a simple linear regression model of the form:



                      $Weight = beta_0 + beta_1*Age + epsilon$


                      where the $epsilon$'s are independent, identically distributed, following a Normal distribution with mean 0 and standard deviation $sigma$. In this model, the mean of the Weight variable for all fishes in the lake sharing the same age is assumed to vary linearly with age. The conditional mean is represented by $beta_0$ + $beta_1$Age. It is called conditional because it is the mean weight for fishes with the same Age. (The unconditional mean weight would be the mean weight of all fishes in the lake, regardless of their weight.)



                      Simple Binary Logistic Regression



                      How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:



                      log(p/(1-p)) = $beta_0$ + $beta_1$Age 


                      where p denotes the conditional probability that a fish of a given age is longer than 30cm. In this model, the conditional mean of the variable "whether or not the fish are longer than 30cm" corresponding to all fishes in the lake sharing the same age is assumed to vary linearly with age after being fed to the logit transformation. The logit-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "whether or not the fish are longer than 30cm" for a given age is a Bernoulli distribution. Recall that for this distribution, the variance is a function of the mean value, so if we can estimate its mean value, we can also estimate its variance. (The mean of a Bernoulli variable is p and the variance is p*(1-p).)



                      Simple Poisson Regression



                      How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:



                      log(mu) = $beta_0$ + $beta_1$Age 


                      where mu denotes the conditional mean value of the outcome variable "number of fish scales" for fish of a given age (that is, the expected number of fish scales for fish of a given age). In this model, the conditional mean of the outcome variable is assumed to vary linearly with age after being fed to the log transformation. The log-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "number of fish scales" for a given age is a Poisson distribution. Recall that for this distribution, the mean and variance are equal so it is sufficient to model its mean value.



                      To sum up, a conditional distribution represents the distribution of the outcome values for specific values of the predictor variable(s) included in the model. Each type of regression model illustrated above imposes certain distributional assumptions on the conditional distribution of the outcome variable given Age. Based on these distributional assumptions, the model proceeds to formulate how (1) the mean of the conditional distribution varies as a function of age (simple linear regression), (2) the logit-transformed mean of the conditional distribution varies as a function of age (simple binary logistic regression) or (3) the log-transformed mean of the conditional distribution varies as a function of age.



                      For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.






                      share|cite
























                        up vote
                        1
                        down vote













                        Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.



                        Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:



                        1. Fish weight (Weight);

                        2. Whether or not the fish are longer than 30cm;

                        3. Number of fish scales.

                        The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.



                        Simple Linear Regression



                        How does Age affect Weight? You are going to formulate a simple linear regression model of the form:



                        $Weight = beta_0 + beta_1*Age + epsilon$


                        where the $epsilon$'s are independent, identically distributed, following a Normal distribution with mean 0 and standard deviation $sigma$. In this model, the mean of the Weight variable for all fishes in the lake sharing the same age is assumed to vary linearly with age. The conditional mean is represented by $beta_0$ + $beta_1$Age. It is called conditional because it is the mean weight for fishes with the same Age. (The unconditional mean weight would be the mean weight of all fishes in the lake, regardless of their weight.)



                        Simple Binary Logistic Regression



                        How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:



                        log(p/(1-p)) = $beta_0$ + $beta_1$Age 


                        where p denotes the conditional probability that a fish of a given age is longer than 30cm. In this model, the conditional mean of the variable "whether or not the fish are longer than 30cm" corresponding to all fishes in the lake sharing the same age is assumed to vary linearly with age after being fed to the logit transformation. The logit-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "whether or not the fish are longer than 30cm" for a given age is a Bernoulli distribution. Recall that for this distribution, the variance is a function of the mean value, so if we can estimate its mean value, we can also estimate its variance. (The mean of a Bernoulli variable is p and the variance is p*(1-p).)



                        Simple Poisson Regression



                        How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:



                        log(mu) = $beta_0$ + $beta_1$Age 


                        where mu denotes the conditional mean value of the outcome variable "number of fish scales" for fish of a given age (that is, the expected number of fish scales for fish of a given age). In this model, the conditional mean of the outcome variable is assumed to vary linearly with age after being fed to the log transformation. The log-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "number of fish scales" for a given age is a Poisson distribution. Recall that for this distribution, the mean and variance are equal so it is sufficient to model its mean value.



                        To sum up, a conditional distribution represents the distribution of the outcome values for specific values of the predictor variable(s) included in the model. Each type of regression model illustrated above imposes certain distributional assumptions on the conditional distribution of the outcome variable given Age. Based on these distributional assumptions, the model proceeds to formulate how (1) the mean of the conditional distribution varies as a function of age (simple linear regression), (2) the logit-transformed mean of the conditional distribution varies as a function of age (simple binary logistic regression) or (3) the log-transformed mean of the conditional distribution varies as a function of age.



                        For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.






                        share|cite






















                          up vote
                          1
                          down vote










                          up vote
                          1
                          down vote









                          Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.



                          Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:



                          1. Fish weight (Weight);

                          2. Whether or not the fish are longer than 30cm;

                          3. Number of fish scales.

                          The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.



                          Simple Linear Regression



                          How does Age affect Weight? You are going to formulate a simple linear regression model of the form:



                          $Weight = beta_0 + beta_1*Age + epsilon$


                          where the $epsilon$'s are independent, identically distributed, following a Normal distribution with mean 0 and standard deviation $sigma$. In this model, the mean of the Weight variable for all fishes in the lake sharing the same age is assumed to vary linearly with age. The conditional mean is represented by $beta_0$ + $beta_1$Age. It is called conditional because it is the mean weight for fishes with the same Age. (The unconditional mean weight would be the mean weight of all fishes in the lake, regardless of their weight.)



                          Simple Binary Logistic Regression



                          How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:



                          log(p/(1-p)) = $beta_0$ + $beta_1$Age 


                          where p denotes the conditional probability that a fish of a given age is longer than 30cm. In this model, the conditional mean of the variable "whether or not the fish are longer than 30cm" corresponding to all fishes in the lake sharing the same age is assumed to vary linearly with age after being fed to the logit transformation. The logit-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "whether or not the fish are longer than 30cm" for a given age is a Bernoulli distribution. Recall that for this distribution, the variance is a function of the mean value, so if we can estimate its mean value, we can also estimate its variance. (The mean of a Bernoulli variable is p and the variance is p*(1-p).)



                          Simple Poisson Regression



                          How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:



                          log(mu) = $beta_0$ + $beta_1$Age 


                          where mu denotes the conditional mean value of the outcome variable "number of fish scales" for fish of a given age (that is, the expected number of fish scales for fish of a given age). In this model, the conditional mean of the outcome variable is assumed to vary linearly with age after being fed to the log transformation. The log-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "number of fish scales" for a given age is a Poisson distribution. Recall that for this distribution, the mean and variance are equal so it is sufficient to model its mean value.



                          To sum up, a conditional distribution represents the distribution of the outcome values for specific values of the predictor variable(s) included in the model. Each type of regression model illustrated above imposes certain distributional assumptions on the conditional distribution of the outcome variable given Age. Based on these distributional assumptions, the model proceeds to formulate how (1) the mean of the conditional distribution varies as a function of age (simple linear regression), (2) the logit-transformed mean of the conditional distribution varies as a function of age (simple binary logistic regression) or (3) the log-transformed mean of the conditional distribution varies as a function of age.



                          For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.






                          share|cite












                          Further to Kjetil's excellent answer, I wanted to add some specific examples to help clarify the meaning of a conditional distribution, which can be a bit of an elusive concept.



                          Let's say you took a random sample of 100 fish from a lake and you are interested in seeing how the age of the fish affects a number of outcome variables:



                          1. Fish weight (Weight);

                          2. Whether or not the fish are longer than 30cm;

                          3. Number of fish scales.

                          The first outcome variable is continuous, the second is binary (0 = fish is NOT longer than 30 cm; 1 = fish IS longer than 30 cm) and the third is a count variable.



                          Simple Linear Regression



                          How does Age affect Weight? You are going to formulate a simple linear regression model of the form:



                          $Weight = beta_0 + beta_1*Age + epsilon$


                          where the $epsilon$'s are independent, identically distributed, following a Normal distribution with mean 0 and standard deviation $sigma$. In this model, the mean of the Weight variable for all fishes in the lake sharing the same age is assumed to vary linearly with age. The conditional mean is represented by $beta_0$ + $beta_1$Age. It is called conditional because it is the mean weight for fishes with the same Age. (The unconditional mean weight would be the mean weight of all fishes in the lake, regardless of their weight.)



                          Simple Binary Logistic Regression



                          How does Age affect whether or not the fish are longer than 30cm? You are going to formulate a simple binary logistic regression model of the form:



                          log(p/(1-p)) = $beta_0$ + $beta_1$Age 


                          where p denotes the conditional probability that a fish of a given age is longer than 30cm. In this model, the conditional mean of the variable "whether or not the fish are longer than 30cm" corresponding to all fishes in the lake sharing the same age is assumed to vary linearly with age after being fed to the logit transformation. The logit-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "whether or not the fish are longer than 30cm" for a given age is a Bernoulli distribution. Recall that for this distribution, the variance is a function of the mean value, so if we can estimate its mean value, we can also estimate its variance. (The mean of a Bernoulli variable is p and the variance is p*(1-p).)



                          Simple Poisson Regression



                          How does Age affect the number of fish scales? You are going to formulate a simple Poisson regression model of the form:



                          log(mu) = $beta_0$ + $beta_1$Age 


                          where mu denotes the conditional mean value of the outcome variable "number of fish scales" for fish of a given age (that is, the expected number of fish scales for fish of a given age). In this model, the conditional mean of the outcome variable is assumed to vary linearly with age after being fed to the log transformation. The log-transformed conditional mean is represented by $beta_0$ + $beta_1$Age. This model works because we assume that the distribution of values of the variable "number of fish scales" for a given age is a Poisson distribution. Recall that for this distribution, the mean and variance are equal so it is sufficient to model its mean value.



                          To sum up, a conditional distribution represents the distribution of the outcome values for specific values of the predictor variable(s) included in the model. Each type of regression model illustrated above imposes certain distributional assumptions on the conditional distribution of the outcome variable given Age. Based on these distributional assumptions, the model proceeds to formulate how (1) the mean of the conditional distribution varies as a function of age (simple linear regression), (2) the logit-transformed mean of the conditional distribution varies as a function of age (simple binary logistic regression) or (3) the log-transformed mean of the conditional distribution varies as a function of age.



                          For each type of model, one can define corresponding residuals. In particular, Pearson and deviance residuals could be defined for the logistic and Poisson regression models.







                          share|cite












                          share|cite



                          share|cite










                          answered 1 min ago









                          Isabella Ghement

                          4,857316




                          4,857316



























                               

                              draft saved


                              draft discarded















































                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f374452%2ffamily-of-glm-represents-the-distribution-of-the-response-variable-or-residuals%23new-answer', 'question_page');

                              );

                              Post as a guest













































































                              Comments

                              Popular posts from this blog

                              Long meetings (6-7 hours a day): Being “babysat” by supervisor

                              Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

                              Confectionery