Shannon Entropy of a fair dice

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












The formula for Shannon entropy is as follows,



$$textEntropy(S) = - sum_i p_i log_2 p_i
$$



Thus, a fair six sided dice should have the Entropy,



$$- sum_i=1^6 dfrac16 log_2 dfrac16 = log_2 (6) = 2.5849...$$



However, the Entropy should also correspond to the average number of questions you have to ask in order to know the outcome (as exampled in this guide under the headline Information Theory).



Now, trying to construct a decision tree to describe the average number of questions we have to ask in order to know the outcome of a dice, this seems to be the optimal one:



Decision tree for dice



Looking at the average number of questions in the image, there are 3 questions in 4/6 cases in 2 questions in 2/6 cases. Thus the Entropy should be:



$$dfrac46 times 3 + dfrac26 times 2 = 2.6666...$$



So, obvisouly the result for the Entropy isn't the same in the two calculations. How come?










share|cite|improve this question























  • Take base 6 entropy, and $H_6(X)=1$, which is the same as max depth of your 6-ary decision tree which decides your outcome. The discrepancy comes form the base of the log. you are trying to encode 6 different outcomes in base 2, so you are going to have to use bit-strings of length at least 3. There is clearly some waste here, since there are $2^3 =8$ unique "encodings" on 3 bits, but you only need to encode 6 elements. In general, the entropy bound gets tighter as the number of elements to encode grows, as the first answer already mentioned.
    – mm8511
    48 mins ago














up vote
2
down vote

favorite












The formula for Shannon entropy is as follows,



$$textEntropy(S) = - sum_i p_i log_2 p_i
$$



Thus, a fair six sided dice should have the Entropy,



$$- sum_i=1^6 dfrac16 log_2 dfrac16 = log_2 (6) = 2.5849...$$



However, the Entropy should also correspond to the average number of questions you have to ask in order to know the outcome (as exampled in this guide under the headline Information Theory).



Now, trying to construct a decision tree to describe the average number of questions we have to ask in order to know the outcome of a dice, this seems to be the optimal one:



Decision tree for dice



Looking at the average number of questions in the image, there are 3 questions in 4/6 cases in 2 questions in 2/6 cases. Thus the Entropy should be:



$$dfrac46 times 3 + dfrac26 times 2 = 2.6666...$$



So, obvisouly the result for the Entropy isn't the same in the two calculations. How come?










share|cite|improve this question























  • Take base 6 entropy, and $H_6(X)=1$, which is the same as max depth of your 6-ary decision tree which decides your outcome. The discrepancy comes form the base of the log. you are trying to encode 6 different outcomes in base 2, so you are going to have to use bit-strings of length at least 3. There is clearly some waste here, since there are $2^3 =8$ unique "encodings" on 3 bits, but you only need to encode 6 elements. In general, the entropy bound gets tighter as the number of elements to encode grows, as the first answer already mentioned.
    – mm8511
    48 mins ago












up vote
2
down vote

favorite









up vote
2
down vote

favorite











The formula for Shannon entropy is as follows,



$$textEntropy(S) = - sum_i p_i log_2 p_i
$$



Thus, a fair six sided dice should have the Entropy,



$$- sum_i=1^6 dfrac16 log_2 dfrac16 = log_2 (6) = 2.5849...$$



However, the Entropy should also correspond to the average number of questions you have to ask in order to know the outcome (as exampled in this guide under the headline Information Theory).



Now, trying to construct a decision tree to describe the average number of questions we have to ask in order to know the outcome of a dice, this seems to be the optimal one:



Decision tree for dice



Looking at the average number of questions in the image, there are 3 questions in 4/6 cases in 2 questions in 2/6 cases. Thus the Entropy should be:



$$dfrac46 times 3 + dfrac26 times 2 = 2.6666...$$



So, obvisouly the result for the Entropy isn't the same in the two calculations. How come?










share|cite|improve this question















The formula for Shannon entropy is as follows,



$$textEntropy(S) = - sum_i p_i log_2 p_i
$$



Thus, a fair six sided dice should have the Entropy,



$$- sum_i=1^6 dfrac16 log_2 dfrac16 = log_2 (6) = 2.5849...$$



However, the Entropy should also correspond to the average number of questions you have to ask in order to know the outcome (as exampled in this guide under the headline Information Theory).



Now, trying to construct a decision tree to describe the average number of questions we have to ask in order to know the outcome of a dice, this seems to be the optimal one:



Decision tree for dice



Looking at the average number of questions in the image, there are 3 questions in 4/6 cases in 2 questions in 2/6 cases. Thus the Entropy should be:



$$dfrac46 times 3 + dfrac26 times 2 = 2.6666...$$



So, obvisouly the result for the Entropy isn't the same in the two calculations. How come?







probability-theory information-theory entropy decision-trees






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited 1 hour ago

























asked 2 hours ago









Mountain_sheep

162




162











  • Take base 6 entropy, and $H_6(X)=1$, which is the same as max depth of your 6-ary decision tree which decides your outcome. The discrepancy comes form the base of the log. you are trying to encode 6 different outcomes in base 2, so you are going to have to use bit-strings of length at least 3. There is clearly some waste here, since there are $2^3 =8$ unique "encodings" on 3 bits, but you only need to encode 6 elements. In general, the entropy bound gets tighter as the number of elements to encode grows, as the first answer already mentioned.
    – mm8511
    48 mins ago
















  • Take base 6 entropy, and $H_6(X)=1$, which is the same as max depth of your 6-ary decision tree which decides your outcome. The discrepancy comes form the base of the log. you are trying to encode 6 different outcomes in base 2, so you are going to have to use bit-strings of length at least 3. There is clearly some waste here, since there are $2^3 =8$ unique "encodings" on 3 bits, but you only need to encode 6 elements. In general, the entropy bound gets tighter as the number of elements to encode grows, as the first answer already mentioned.
    – mm8511
    48 mins ago















Take base 6 entropy, and $H_6(X)=1$, which is the same as max depth of your 6-ary decision tree which decides your outcome. The discrepancy comes form the base of the log. you are trying to encode 6 different outcomes in base 2, so you are going to have to use bit-strings of length at least 3. There is clearly some waste here, since there are $2^3 =8$ unique "encodings" on 3 bits, but you only need to encode 6 elements. In general, the entropy bound gets tighter as the number of elements to encode grows, as the first answer already mentioned.
– mm8511
48 mins ago




Take base 6 entropy, and $H_6(X)=1$, which is the same as max depth of your 6-ary decision tree which decides your outcome. The discrepancy comes form the base of the log. you are trying to encode 6 different outcomes in base 2, so you are going to have to use bit-strings of length at least 3. There is clearly some waste here, since there are $2^3 =8$ unique "encodings" on 3 bits, but you only need to encode 6 elements. In general, the entropy bound gets tighter as the number of elements to encode grows, as the first answer already mentioned.
– mm8511
48 mins ago










2 Answers
2






active

oldest

votes

















up vote
4
down vote













In your setting, the Shannon entropy is "just" a lower bound for an entropy of any decision tree (including optimal ones). These don't have to coincide. To get closer to what the Shannon entropy is, imagine an optimal decision tree identifying outcomes of throwing a dice $N$ times with some large $N$ (assuming independence). The larger $N$ is, the smaller (yet nonnegative) is the difference between the "averaged" (i.e. divided by $N$) entropy of this "compound" decision tree and the Shannon entropy of the dice. (It resembles a background of arithmetic coding).






share|cite|improve this answer





























    up vote
    3
    down vote













    To recover entropy, you have to consider a sequence of dice throws, and ask how many questions per roll you need in an optimal strategy, in the limit that the number of rolls goes to infinity. Note that each question can cover all the rolls, for example for two rolls, you could ask at some point: “Are the results in $16,21,22,23$?” (where the first digit denotes the first throw, and the second digit denotes the second throw).



    I'm too lazy to do it for 36 possibilities, therefore here a simpler example: Consider that each roll gives only one of three results with equal probability. Then the entropy is about $1.58496$.



    For one toss, the optimal strategy is simply to ask “was it $1$?” followed by ”was it $2$?”, which on average gives $5/3 = 1.66$ questions.



    For two tosses, an optimal strategy would be to first ask “was it one of $11,12,13,21$?” (where the first digit gives the result of the first toss, and the second digit the result of the second toss). If the answer is “yes”, then use two questions to single out one of the for results. Otherwise, ask “was the first toss a $2$?”, if yes then it was one of $22$ or $23$, and one question is sufficient to determine that. In the remaining case you know the first toss was $3$ and know nothing about the second, so you employ the one-toss strategy to determine the second toss.



    This strategy needs on average $29/9=3.2222$ questions, or $1.61111$ questions per toss. Which is already much better, and indeed only $1.65,%$ worse that the value given by the entropy.



    Note that the average number of questions of the single-toss optimal strategy can differ dramatically from the entropy. For this, consider the toss of a biased coin. The entropy of this can be made arbitrary low by making the coin sufficiently biased. But obviously there's no way you can get the result of a coin toss with less than one question.






    share|cite|improve this answer






















      Your Answer




      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "69"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2916887%2fshannon-entropy-of-a-fair-dice%23new-answer', 'question_page');

      );

      Post as a guest






























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      4
      down vote













      In your setting, the Shannon entropy is "just" a lower bound for an entropy of any decision tree (including optimal ones). These don't have to coincide. To get closer to what the Shannon entropy is, imagine an optimal decision tree identifying outcomes of throwing a dice $N$ times with some large $N$ (assuming independence). The larger $N$ is, the smaller (yet nonnegative) is the difference between the "averaged" (i.e. divided by $N$) entropy of this "compound" decision tree and the Shannon entropy of the dice. (It resembles a background of arithmetic coding).






      share|cite|improve this answer


























        up vote
        4
        down vote













        In your setting, the Shannon entropy is "just" a lower bound for an entropy of any decision tree (including optimal ones). These don't have to coincide. To get closer to what the Shannon entropy is, imagine an optimal decision tree identifying outcomes of throwing a dice $N$ times with some large $N$ (assuming independence). The larger $N$ is, the smaller (yet nonnegative) is the difference between the "averaged" (i.e. divided by $N$) entropy of this "compound" decision tree and the Shannon entropy of the dice. (It resembles a background of arithmetic coding).






        share|cite|improve this answer
























          up vote
          4
          down vote










          up vote
          4
          down vote









          In your setting, the Shannon entropy is "just" a lower bound for an entropy of any decision tree (including optimal ones). These don't have to coincide. To get closer to what the Shannon entropy is, imagine an optimal decision tree identifying outcomes of throwing a dice $N$ times with some large $N$ (assuming independence). The larger $N$ is, the smaller (yet nonnegative) is the difference between the "averaged" (i.e. divided by $N$) entropy of this "compound" decision tree and the Shannon entropy of the dice. (It resembles a background of arithmetic coding).






          share|cite|improve this answer














          In your setting, the Shannon entropy is "just" a lower bound for an entropy of any decision tree (including optimal ones). These don't have to coincide. To get closer to what the Shannon entropy is, imagine an optimal decision tree identifying outcomes of throwing a dice $N$ times with some large $N$ (assuming independence). The larger $N$ is, the smaller (yet nonnegative) is the difference between the "averaged" (i.e. divided by $N$) entropy of this "compound" decision tree and the Shannon entropy of the dice. (It resembles a background of arithmetic coding).







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited 1 min ago

























          answered 1 hour ago









          metamorphy

          9178




          9178




















              up vote
              3
              down vote













              To recover entropy, you have to consider a sequence of dice throws, and ask how many questions per roll you need in an optimal strategy, in the limit that the number of rolls goes to infinity. Note that each question can cover all the rolls, for example for two rolls, you could ask at some point: “Are the results in $16,21,22,23$?” (where the first digit denotes the first throw, and the second digit denotes the second throw).



              I'm too lazy to do it for 36 possibilities, therefore here a simpler example: Consider that each roll gives only one of three results with equal probability. Then the entropy is about $1.58496$.



              For one toss, the optimal strategy is simply to ask “was it $1$?” followed by ”was it $2$?”, which on average gives $5/3 = 1.66$ questions.



              For two tosses, an optimal strategy would be to first ask “was it one of $11,12,13,21$?” (where the first digit gives the result of the first toss, and the second digit the result of the second toss). If the answer is “yes”, then use two questions to single out one of the for results. Otherwise, ask “was the first toss a $2$?”, if yes then it was one of $22$ or $23$, and one question is sufficient to determine that. In the remaining case you know the first toss was $3$ and know nothing about the second, so you employ the one-toss strategy to determine the second toss.



              This strategy needs on average $29/9=3.2222$ questions, or $1.61111$ questions per toss. Which is already much better, and indeed only $1.65,%$ worse that the value given by the entropy.



              Note that the average number of questions of the single-toss optimal strategy can differ dramatically from the entropy. For this, consider the toss of a biased coin. The entropy of this can be made arbitrary low by making the coin sufficiently biased. But obviously there's no way you can get the result of a coin toss with less than one question.






              share|cite|improve this answer


























                up vote
                3
                down vote













                To recover entropy, you have to consider a sequence of dice throws, and ask how many questions per roll you need in an optimal strategy, in the limit that the number of rolls goes to infinity. Note that each question can cover all the rolls, for example for two rolls, you could ask at some point: “Are the results in $16,21,22,23$?” (where the first digit denotes the first throw, and the second digit denotes the second throw).



                I'm too lazy to do it for 36 possibilities, therefore here a simpler example: Consider that each roll gives only one of three results with equal probability. Then the entropy is about $1.58496$.



                For one toss, the optimal strategy is simply to ask “was it $1$?” followed by ”was it $2$?”, which on average gives $5/3 = 1.66$ questions.



                For two tosses, an optimal strategy would be to first ask “was it one of $11,12,13,21$?” (where the first digit gives the result of the first toss, and the second digit the result of the second toss). If the answer is “yes”, then use two questions to single out one of the for results. Otherwise, ask “was the first toss a $2$?”, if yes then it was one of $22$ or $23$, and one question is sufficient to determine that. In the remaining case you know the first toss was $3$ and know nothing about the second, so you employ the one-toss strategy to determine the second toss.



                This strategy needs on average $29/9=3.2222$ questions, or $1.61111$ questions per toss. Which is already much better, and indeed only $1.65,%$ worse that the value given by the entropy.



                Note that the average number of questions of the single-toss optimal strategy can differ dramatically from the entropy. For this, consider the toss of a biased coin. The entropy of this can be made arbitrary low by making the coin sufficiently biased. But obviously there's no way you can get the result of a coin toss with less than one question.






                share|cite|improve this answer
























                  up vote
                  3
                  down vote










                  up vote
                  3
                  down vote









                  To recover entropy, you have to consider a sequence of dice throws, and ask how many questions per roll you need in an optimal strategy, in the limit that the number of rolls goes to infinity. Note that each question can cover all the rolls, for example for two rolls, you could ask at some point: “Are the results in $16,21,22,23$?” (where the first digit denotes the first throw, and the second digit denotes the second throw).



                  I'm too lazy to do it for 36 possibilities, therefore here a simpler example: Consider that each roll gives only one of three results with equal probability. Then the entropy is about $1.58496$.



                  For one toss, the optimal strategy is simply to ask “was it $1$?” followed by ”was it $2$?”, which on average gives $5/3 = 1.66$ questions.



                  For two tosses, an optimal strategy would be to first ask “was it one of $11,12,13,21$?” (where the first digit gives the result of the first toss, and the second digit the result of the second toss). If the answer is “yes”, then use two questions to single out one of the for results. Otherwise, ask “was the first toss a $2$?”, if yes then it was one of $22$ or $23$, and one question is sufficient to determine that. In the remaining case you know the first toss was $3$ and know nothing about the second, so you employ the one-toss strategy to determine the second toss.



                  This strategy needs on average $29/9=3.2222$ questions, or $1.61111$ questions per toss. Which is already much better, and indeed only $1.65,%$ worse that the value given by the entropy.



                  Note that the average number of questions of the single-toss optimal strategy can differ dramatically from the entropy. For this, consider the toss of a biased coin. The entropy of this can be made arbitrary low by making the coin sufficiently biased. But obviously there's no way you can get the result of a coin toss with less than one question.






                  share|cite|improve this answer














                  To recover entropy, you have to consider a sequence of dice throws, and ask how many questions per roll you need in an optimal strategy, in the limit that the number of rolls goes to infinity. Note that each question can cover all the rolls, for example for two rolls, you could ask at some point: “Are the results in $16,21,22,23$?” (where the first digit denotes the first throw, and the second digit denotes the second throw).



                  I'm too lazy to do it for 36 possibilities, therefore here a simpler example: Consider that each roll gives only one of three results with equal probability. Then the entropy is about $1.58496$.



                  For one toss, the optimal strategy is simply to ask “was it $1$?” followed by ”was it $2$?”, which on average gives $5/3 = 1.66$ questions.



                  For two tosses, an optimal strategy would be to first ask “was it one of $11,12,13,21$?” (where the first digit gives the result of the first toss, and the second digit the result of the second toss). If the answer is “yes”, then use two questions to single out one of the for results. Otherwise, ask “was the first toss a $2$?”, if yes then it was one of $22$ or $23$, and one question is sufficient to determine that. In the remaining case you know the first toss was $3$ and know nothing about the second, so you employ the one-toss strategy to determine the second toss.



                  This strategy needs on average $29/9=3.2222$ questions, or $1.61111$ questions per toss. Which is already much better, and indeed only $1.65,%$ worse that the value given by the entropy.



                  Note that the average number of questions of the single-toss optimal strategy can differ dramatically from the entropy. For this, consider the toss of a biased coin. The entropy of this can be made arbitrary low by making the coin sufficiently biased. But obviously there's no way you can get the result of a coin toss with less than one question.







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited 46 mins ago

























                  answered 55 mins ago









                  celtschk

                  28.8k75497




                  28.8k75497



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2916887%2fshannon-entropy-of-a-fair-dice%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Comments

                      Popular posts from this blog

                      What does second last employer means? [closed]

                      Installing NextGIS Connect into QGIS 3?

                      One-line joke