Is there more than one “median” formula?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite












In my work, when individuals refer to the "mean" value of a data set, they're typically referring to the arithmetic mean (i.e. "average", or "expected value"). If I provided the geometric mean, people would likely think I'm being snide or non-helpful, as the definition of "mean" is known in advance.



I'm trying to determine if there are multiple definitions of the "median" of a data set. For example, one of the definitions provided by a colleague for finding the median of a data set with an even number of elements would be:



Algorithm 'A'



  • Divide the number of elements by two, round down.

  • That value is the index of the median.

  • i.e. For the following set, the median would be 5.

  • [4, 5, 6, 7]

This seems to make sense, though the rounding-down aspect seems a bit arbitrary.



Algorithm 'B'



In any case, another colleague has proposed a separate algorithm, which was in a stats textbook of his (need to get the name and author):



  • Divide the number of elements by 2, and keep a copy of the rounded-up and rounded-down integers. Name them n_lo and n_hi.

  • Take the arithmetic mean of the elements at n_lo and n_hi.

  • i.e. For the following set, the median would be (5+6)/2 = 5.5.

  • [4, 5, 6, 7]

This seems wrong though, as the median value, 5.5 in this case, isn't actually in the original data set. When we swapped out algorithm 'A' for 'B' in some test code, it broke horribly (as we expected).



Question



Is there a formal "name" for these two approaches to calculating the median of a data set? i.e. "lesser-of-the-two median" versus "average-the-middle-elements-and-make-new-data median"?










share|cite|improve this question







New contributor




DevNull is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    up vote
    3
    down vote

    favorite












    In my work, when individuals refer to the "mean" value of a data set, they're typically referring to the arithmetic mean (i.e. "average", or "expected value"). If I provided the geometric mean, people would likely think I'm being snide or non-helpful, as the definition of "mean" is known in advance.



    I'm trying to determine if there are multiple definitions of the "median" of a data set. For example, one of the definitions provided by a colleague for finding the median of a data set with an even number of elements would be:



    Algorithm 'A'



    • Divide the number of elements by two, round down.

    • That value is the index of the median.

    • i.e. For the following set, the median would be 5.

    • [4, 5, 6, 7]

    This seems to make sense, though the rounding-down aspect seems a bit arbitrary.



    Algorithm 'B'



    In any case, another colleague has proposed a separate algorithm, which was in a stats textbook of his (need to get the name and author):



    • Divide the number of elements by 2, and keep a copy of the rounded-up and rounded-down integers. Name them n_lo and n_hi.

    • Take the arithmetic mean of the elements at n_lo and n_hi.

    • i.e. For the following set, the median would be (5+6)/2 = 5.5.

    • [4, 5, 6, 7]

    This seems wrong though, as the median value, 5.5 in this case, isn't actually in the original data set. When we swapped out algorithm 'A' for 'B' in some test code, it broke horribly (as we expected).



    Question



    Is there a formal "name" for these two approaches to calculating the median of a data set? i.e. "lesser-of-the-two median" versus "average-the-middle-elements-and-make-new-data median"?










    share|cite|improve this question







    New contributor




    DevNull is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





















      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      In my work, when individuals refer to the "mean" value of a data set, they're typically referring to the arithmetic mean (i.e. "average", or "expected value"). If I provided the geometric mean, people would likely think I'm being snide or non-helpful, as the definition of "mean" is known in advance.



      I'm trying to determine if there are multiple definitions of the "median" of a data set. For example, one of the definitions provided by a colleague for finding the median of a data set with an even number of elements would be:



      Algorithm 'A'



      • Divide the number of elements by two, round down.

      • That value is the index of the median.

      • i.e. For the following set, the median would be 5.

      • [4, 5, 6, 7]

      This seems to make sense, though the rounding-down aspect seems a bit arbitrary.



      Algorithm 'B'



      In any case, another colleague has proposed a separate algorithm, which was in a stats textbook of his (need to get the name and author):



      • Divide the number of elements by 2, and keep a copy of the rounded-up and rounded-down integers. Name them n_lo and n_hi.

      • Take the arithmetic mean of the elements at n_lo and n_hi.

      • i.e. For the following set, the median would be (5+6)/2 = 5.5.

      • [4, 5, 6, 7]

      This seems wrong though, as the median value, 5.5 in this case, isn't actually in the original data set. When we swapped out algorithm 'A' for 'B' in some test code, it broke horribly (as we expected).



      Question



      Is there a formal "name" for these two approaches to calculating the median of a data set? i.e. "lesser-of-the-two median" versus "average-the-middle-elements-and-make-new-data median"?










      share|cite|improve this question







      New contributor




      DevNull is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      In my work, when individuals refer to the "mean" value of a data set, they're typically referring to the arithmetic mean (i.e. "average", or "expected value"). If I provided the geometric mean, people would likely think I'm being snide or non-helpful, as the definition of "mean" is known in advance.



      I'm trying to determine if there are multiple definitions of the "median" of a data set. For example, one of the definitions provided by a colleague for finding the median of a data set with an even number of elements would be:



      Algorithm 'A'



      • Divide the number of elements by two, round down.

      • That value is the index of the median.

      • i.e. For the following set, the median would be 5.

      • [4, 5, 6, 7]

      This seems to make sense, though the rounding-down aspect seems a bit arbitrary.



      Algorithm 'B'



      In any case, another colleague has proposed a separate algorithm, which was in a stats textbook of his (need to get the name and author):



      • Divide the number of elements by 2, and keep a copy of the rounded-up and rounded-down integers. Name them n_lo and n_hi.

      • Take the arithmetic mean of the elements at n_lo and n_hi.

      • i.e. For the following set, the median would be (5+6)/2 = 5.5.

      • [4, 5, 6, 7]

      This seems wrong though, as the median value, 5.5 in this case, isn't actually in the original data set. When we swapped out algorithm 'A' for 'B' in some test code, it broke horribly (as we expected).



      Question



      Is there a formal "name" for these two approaches to calculating the median of a data set? i.e. "lesser-of-the-two median" versus "average-the-middle-elements-and-make-new-data median"?







      median definition






      share|cite|improve this question







      New contributor




      DevNull is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|cite|improve this question







      New contributor




      DevNull is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|cite|improve this question




      share|cite|improve this question






      New contributor




      DevNull is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 43 mins ago









      DevNull

      1184




      1184




      New contributor




      DevNull is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      DevNull is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      DevNull is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          4
          down vote



          accepted










          TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.




          Medians of random variables are not necessarily unique. Consider this definition from Hogg, McKean and Craig's Mathematical Statistics (which discusses the median of a distribution, not a sample):




          A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.




          In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,




          Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.




          The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)



          But this seems unnecessarily restrictive; I would prefer to be able to define a median of a random sample for even or odd $n$. Moreover, I would like the median to be unique. Given these two requirements, I have to make some decisions about how to best find a unique sample median. Both Algorithm A and Algorithm B satisfy these requirements. Imposing additional requirements could eliminate either or both from consideration.



          Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.




          Whether or not a particular estimator breaks unit tests is a property of the unit tests -- unit tests written against a specific estimator won't necessarily hold when you substitute another estimator. In the ideal case, the unit tests were chosen because they reflect the critical needs of your organization, not because of a doctrinaire argument over definitions.






          share|cite|improve this answer





























            up vote
            5
            down vote













            What @Sycorax says.



            As a matter of fact, there are suprisingly many definitions of general quantiles, so in particular also of medians. Hyndman & Fan (1996, The American Statistician) give an overview that is, AFAIK, still comprehensive. The different types do not have formal names. You may simply need to be clear on which type you are using. (It often does not make a big difference with data sets of realistic sizes.)



            Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:



            > median(4:7)
            [1] 5.5


            R's median() by default uses type 7 of Hyndman & Fan's classification.






            share|cite|improve this answer






















              Your Answer




              StackExchange.ifUsing("editor", function ()
              return StackExchange.using("mathjaxEditing", function ()
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              );
              );
              , "mathjax-editing");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "65"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: false,
              noModals: false,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );






              DevNull is a new contributor. Be nice, and check out our Code of Conduct.









               

              draft saved


              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f367467%2fis-there-more-than-one-median-formula%23new-answer', 'question_page');

              );

              Post as a guest






























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              4
              down vote



              accepted










              TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.




              Medians of random variables are not necessarily unique. Consider this definition from Hogg, McKean and Craig's Mathematical Statistics (which discusses the median of a distribution, not a sample):




              A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.




              In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,




              Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.




              The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)



              But this seems unnecessarily restrictive; I would prefer to be able to define a median of a random sample for even or odd $n$. Moreover, I would like the median to be unique. Given these two requirements, I have to make some decisions about how to best find a unique sample median. Both Algorithm A and Algorithm B satisfy these requirements. Imposing additional requirements could eliminate either or both from consideration.



              Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.




              Whether or not a particular estimator breaks unit tests is a property of the unit tests -- unit tests written against a specific estimator won't necessarily hold when you substitute another estimator. In the ideal case, the unit tests were chosen because they reflect the critical needs of your organization, not because of a doctrinaire argument over definitions.






              share|cite|improve this answer


























                up vote
                4
                down vote



                accepted










                TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.




                Medians of random variables are not necessarily unique. Consider this definition from Hogg, McKean and Craig's Mathematical Statistics (which discusses the median of a distribution, not a sample):




                A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.




                In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,




                Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.




                The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)



                But this seems unnecessarily restrictive; I would prefer to be able to define a median of a random sample for even or odd $n$. Moreover, I would like the median to be unique. Given these two requirements, I have to make some decisions about how to best find a unique sample median. Both Algorithm A and Algorithm B satisfy these requirements. Imposing additional requirements could eliminate either or both from consideration.



                Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.




                Whether or not a particular estimator breaks unit tests is a property of the unit tests -- unit tests written against a specific estimator won't necessarily hold when you substitute another estimator. In the ideal case, the unit tests were chosen because they reflect the critical needs of your organization, not because of a doctrinaire argument over definitions.






                share|cite|improve this answer
























                  up vote
                  4
                  down vote



                  accepted







                  up vote
                  4
                  down vote



                  accepted






                  TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.




                  Medians of random variables are not necessarily unique. Consider this definition from Hogg, McKean and Craig's Mathematical Statistics (which discusses the median of a distribution, not a sample):




                  A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.




                  In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,




                  Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.




                  The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)



                  But this seems unnecessarily restrictive; I would prefer to be able to define a median of a random sample for even or odd $n$. Moreover, I would like the median to be unique. Given these two requirements, I have to make some decisions about how to best find a unique sample median. Both Algorithm A and Algorithm B satisfy these requirements. Imposing additional requirements could eliminate either or both from consideration.



                  Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.




                  Whether or not a particular estimator breaks unit tests is a property of the unit tests -- unit tests written against a specific estimator won't necessarily hold when you substitute another estimator. In the ideal case, the unit tests were chosen because they reflect the critical needs of your organization, not because of a doctrinaire argument over definitions.






                  share|cite|improve this answer














                  TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.




                  Medians of random variables are not necessarily unique. Consider this definition from Hogg, McKean and Craig's Mathematical Statistics (which discusses the median of a distribution, not a sample):




                  A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.




                  In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,




                  Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.




                  The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)



                  But this seems unnecessarily restrictive; I would prefer to be able to define a median of a random sample for even or odd $n$. Moreover, I would like the median to be unique. Given these two requirements, I have to make some decisions about how to best find a unique sample median. Both Algorithm A and Algorithm B satisfy these requirements. Imposing additional requirements could eliminate either or both from consideration.



                  Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.




                  Whether or not a particular estimator breaks unit tests is a property of the unit tests -- unit tests written against a specific estimator won't necessarily hold when you substitute another estimator. In the ideal case, the unit tests were chosen because they reflect the critical needs of your organization, not because of a doctrinaire argument over definitions.







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited 15 mins ago

























                  answered 23 mins ago









                  Sycorax

                  33.8k587154




                  33.8k587154






















                      up vote
                      5
                      down vote













                      What @Sycorax says.



                      As a matter of fact, there are suprisingly many definitions of general quantiles, so in particular also of medians. Hyndman & Fan (1996, The American Statistician) give an overview that is, AFAIK, still comprehensive. The different types do not have formal names. You may simply need to be clear on which type you are using. (It often does not make a big difference with data sets of realistic sizes.)



                      Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:



                      > median(4:7)
                      [1] 5.5


                      R's median() by default uses type 7 of Hyndman & Fan's classification.






                      share|cite|improve this answer


























                        up vote
                        5
                        down vote













                        What @Sycorax says.



                        As a matter of fact, there are suprisingly many definitions of general quantiles, so in particular also of medians. Hyndman & Fan (1996, The American Statistician) give an overview that is, AFAIK, still comprehensive. The different types do not have formal names. You may simply need to be clear on which type you are using. (It often does not make a big difference with data sets of realistic sizes.)



                        Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:



                        > median(4:7)
                        [1] 5.5


                        R's median() by default uses type 7 of Hyndman & Fan's classification.






                        share|cite|improve this answer
























                          up vote
                          5
                          down vote










                          up vote
                          5
                          down vote









                          What @Sycorax says.



                          As a matter of fact, there are suprisingly many definitions of general quantiles, so in particular also of medians. Hyndman & Fan (1996, The American Statistician) give an overview that is, AFAIK, still comprehensive. The different types do not have formal names. You may simply need to be clear on which type you are using. (It often does not make a big difference with data sets of realistic sizes.)



                          Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:



                          > median(4:7)
                          [1] 5.5


                          R's median() by default uses type 7 of Hyndman & Fan's classification.






                          share|cite|improve this answer














                          What @Sycorax says.



                          As a matter of fact, there are suprisingly many definitions of general quantiles, so in particular also of medians. Hyndman & Fan (1996, The American Statistician) give an overview that is, AFAIK, still comprehensive. The different types do not have formal names. You may simply need to be clear on which type you are using. (It often does not make a big difference with data sets of realistic sizes.)



                          Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:



                          > median(4:7)
                          [1] 5.5


                          R's median() by default uses type 7 of Hyndman & Fan's classification.







                          share|cite|improve this answer














                          share|cite|improve this answer



                          share|cite|improve this answer








                          edited 19 mins ago

























                          answered 32 mins ago









                          Stephan Kolassa

                          40.7k687150




                          40.7k687150




















                              DevNull is a new contributor. Be nice, and check out our Code of Conduct.









                               

                              draft saved


                              draft discarded


















                              DevNull is a new contributor. Be nice, and check out our Code of Conduct.












                              DevNull is a new contributor. Be nice, and check out our Code of Conduct.











                              DevNull is a new contributor. Be nice, and check out our Code of Conduct.













                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f367467%2fis-there-more-than-one-median-formula%23new-answer', 'question_page');

                              );

                              Post as a guest













































































                              Comments

                              Popular posts from this blog

                              What does second last employer means? [closed]

                              Installing NextGIS Connect into QGIS 3?

                              One-line joke