What Base Should Be Used For Negative Log Likelihood?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












When calculating the negative log likelihood loss, what base of log are we supposed to use?










share|improve this question









New contributor




Brandon Lavigne is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.























    up vote
    1
    down vote

    favorite












    When calculating the negative log likelihood loss, what base of log are we supposed to use?










    share|improve this question









    New contributor




    Brandon Lavigne is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





















      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      When calculating the negative log likelihood loss, what base of log are we supposed to use?










      share|improve this question









      New contributor




      Brandon Lavigne is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      When calculating the negative log likelihood loss, what base of log are we supposed to use?







      machine-learning loss-function






      share|improve this question









      New contributor




      Brandon Lavigne is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Brandon Lavigne is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 11 mins ago









      duckmayr

      1032




      1032






      New contributor




      Brandon Lavigne is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 8 hours ago









      Brandon Lavigne

      1083




      1083




      New contributor




      Brandon Lavigne is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Brandon Lavigne is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Brandon Lavigne is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          4
          down vote



          accepted










          Typically it is implemented as the natural logarithm, base e. Other bases can be used for the same effect though.






          share|improve this answer
















          • 1




            (+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
            – duckmayr
            2 hours ago

















          up vote
          3
          down vote













          The change in base is equivalent to multiplying the function by a constant. It does not affect the computation.



          $
          log_b(x) = dfrac1log_e(b).log_e(x)
          $






          share|improve this answer



























            up vote
            0
            down vote













            Generally, when the log likelihood is being calculated, it's being done as a loss function, that is, an amount that is being optimized. Changing the base multiplies the log by a constant. As long as the bases are either both greater than one, or both less than one, this constant is positive (note that "negative log likelihood" can be interpreted as taking the log base a number less than one), and multiplying a function by a constant greater than one doesn't affect what inputs optimize the value of that function. In other words, it doesn't matter. Changing the base basically is a change of units: the log base $2$ is units of bits, log base $256$ is units of bytes, log base $e$ is units of nits. So it's like asking "Okay, we're trying to minimize the amount of wire that we're using ... but are we minimizing the amount of wire in feet, or the amount of wire in meters?"



            The natural base $e$ is often used because it makes some of the math easier, but the base $2$ is also used in some contexts because it allows reporting the log in the units of bits. In cases where the absolute, rather relative, value of log likelihood is important, the base should be indicated either by explicitly naming the base or giving the units (e.g. bits, nits, etc.).






            share|improve this answer




















              Your Answer





              StackExchange.ifUsing("editor", function ()
              return StackExchange.using("mathjaxEditing", function ()
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              );
              );
              , "mathjax-editing");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "557"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              noCode: true, onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );






              Brandon Lavigne is a new contributor. Be nice, and check out our Code of Conduct.









               

              draft saved


              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40650%2fwhat-base-should-be-used-for-negative-log-likelihood%23new-answer', 'question_page');

              );

              Post as a guest






























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              4
              down vote



              accepted










              Typically it is implemented as the natural logarithm, base e. Other bases can be used for the same effect though.






              share|improve this answer
















              • 1




                (+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
                – duckmayr
                2 hours ago














              up vote
              4
              down vote



              accepted










              Typically it is implemented as the natural logarithm, base e. Other bases can be used for the same effect though.






              share|improve this answer
















              • 1




                (+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
                – duckmayr
                2 hours ago












              up vote
              4
              down vote



              accepted







              up vote
              4
              down vote



              accepted






              Typically it is implemented as the natural logarithm, base e. Other bases can be used for the same effect though.






              share|improve this answer












              Typically it is implemented as the natural logarithm, base e. Other bases can be used for the same effect though.







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered 5 hours ago









              JahKnows

              4,261423




              4,261423







              • 1




                (+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
                – duckmayr
                2 hours ago












              • 1




                (+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
                – duckmayr
                2 hours ago







              1




              1




              (+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
              – duckmayr
              2 hours ago




              (+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
              – duckmayr
              2 hours ago










              up vote
              3
              down vote













              The change in base is equivalent to multiplying the function by a constant. It does not affect the computation.



              $
              log_b(x) = dfrac1log_e(b).log_e(x)
              $






              share|improve this answer
























                up vote
                3
                down vote













                The change in base is equivalent to multiplying the function by a constant. It does not affect the computation.



                $
                log_b(x) = dfrac1log_e(b).log_e(x)
                $






                share|improve this answer






















                  up vote
                  3
                  down vote










                  up vote
                  3
                  down vote









                  The change in base is equivalent to multiplying the function by a constant. It does not affect the computation.



                  $
                  log_b(x) = dfrac1log_e(b).log_e(x)
                  $






                  share|improve this answer












                  The change in base is equivalent to multiplying the function by a constant. It does not affect the computation.



                  $
                  log_b(x) = dfrac1log_e(b).log_e(x)
                  $







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 5 hours ago









                  Anshul G.

                  30617




                  30617




















                      up vote
                      0
                      down vote













                      Generally, when the log likelihood is being calculated, it's being done as a loss function, that is, an amount that is being optimized. Changing the base multiplies the log by a constant. As long as the bases are either both greater than one, or both less than one, this constant is positive (note that "negative log likelihood" can be interpreted as taking the log base a number less than one), and multiplying a function by a constant greater than one doesn't affect what inputs optimize the value of that function. In other words, it doesn't matter. Changing the base basically is a change of units: the log base $2$ is units of bits, log base $256$ is units of bytes, log base $e$ is units of nits. So it's like asking "Okay, we're trying to minimize the amount of wire that we're using ... but are we minimizing the amount of wire in feet, or the amount of wire in meters?"



                      The natural base $e$ is often used because it makes some of the math easier, but the base $2$ is also used in some contexts because it allows reporting the log in the units of bits. In cases where the absolute, rather relative, value of log likelihood is important, the base should be indicated either by explicitly naming the base or giving the units (e.g. bits, nits, etc.).






                      share|improve this answer
























                        up vote
                        0
                        down vote













                        Generally, when the log likelihood is being calculated, it's being done as a loss function, that is, an amount that is being optimized. Changing the base multiplies the log by a constant. As long as the bases are either both greater than one, or both less than one, this constant is positive (note that "negative log likelihood" can be interpreted as taking the log base a number less than one), and multiplying a function by a constant greater than one doesn't affect what inputs optimize the value of that function. In other words, it doesn't matter. Changing the base basically is a change of units: the log base $2$ is units of bits, log base $256$ is units of bytes, log base $e$ is units of nits. So it's like asking "Okay, we're trying to minimize the amount of wire that we're using ... but are we minimizing the amount of wire in feet, or the amount of wire in meters?"



                        The natural base $e$ is often used because it makes some of the math easier, but the base $2$ is also used in some contexts because it allows reporting the log in the units of bits. In cases where the absolute, rather relative, value of log likelihood is important, the base should be indicated either by explicitly naming the base or giving the units (e.g. bits, nits, etc.).






                        share|improve this answer






















                          up vote
                          0
                          down vote










                          up vote
                          0
                          down vote









                          Generally, when the log likelihood is being calculated, it's being done as a loss function, that is, an amount that is being optimized. Changing the base multiplies the log by a constant. As long as the bases are either both greater than one, or both less than one, this constant is positive (note that "negative log likelihood" can be interpreted as taking the log base a number less than one), and multiplying a function by a constant greater than one doesn't affect what inputs optimize the value of that function. In other words, it doesn't matter. Changing the base basically is a change of units: the log base $2$ is units of bits, log base $256$ is units of bytes, log base $e$ is units of nits. So it's like asking "Okay, we're trying to minimize the amount of wire that we're using ... but are we minimizing the amount of wire in feet, or the amount of wire in meters?"



                          The natural base $e$ is often used because it makes some of the math easier, but the base $2$ is also used in some contexts because it allows reporting the log in the units of bits. In cases where the absolute, rather relative, value of log likelihood is important, the base should be indicated either by explicitly naming the base or giving the units (e.g. bits, nits, etc.).






                          share|improve this answer












                          Generally, when the log likelihood is being calculated, it's being done as a loss function, that is, an amount that is being optimized. Changing the base multiplies the log by a constant. As long as the bases are either both greater than one, or both less than one, this constant is positive (note that "negative log likelihood" can be interpreted as taking the log base a number less than one), and multiplying a function by a constant greater than one doesn't affect what inputs optimize the value of that function. In other words, it doesn't matter. Changing the base basically is a change of units: the log base $2$ is units of bits, log base $256$ is units of bytes, log base $e$ is units of nits. So it's like asking "Okay, we're trying to minimize the amount of wire that we're using ... but are we minimizing the amount of wire in feet, or the amount of wire in meters?"



                          The natural base $e$ is often used because it makes some of the math easier, but the base $2$ is also used in some contexts because it allows reporting the log in the units of bits. In cases where the absolute, rather relative, value of log likelihood is important, the base should be indicated either by explicitly naming the base or giving the units (e.g. bits, nits, etc.).







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 13 mins ago









                          Acccumulation

                          1211




                          1211




















                              Brandon Lavigne is a new contributor. Be nice, and check out our Code of Conduct.









                               

                              draft saved


                              draft discarded


















                              Brandon Lavigne is a new contributor. Be nice, and check out our Code of Conduct.












                              Brandon Lavigne is a new contributor. Be nice, and check out our Code of Conduct.











                              Brandon Lavigne is a new contributor. Be nice, and check out our Code of Conduct.













                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40650%2fwhat-base-should-be-used-for-negative-log-likelihood%23new-answer', 'question_page');

                              );

                              Post as a guest













































































                              Comments

                              Popular posts from this blog

                              Long meetings (6-7 hours a day): Being “babysat” by supervisor

                              What does second last employer means? [closed]

                              One-line joke