Is there more than one âmedianâ formula?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
3
down vote
favorite
In my work, when individuals refer to the "mean" value of a data set, they're typically referring to the arithmetic mean (i.e. "average", or "expected value"). If I provided the geometric mean, people would likely think I'm being snide or non-helpful, as the definition of "mean" is known in advance.
I'm trying to determine if there are multiple definitions of the "median" of a data set. For example, one of the definitions provided by a colleague for finding the median of a data set with an even number of elements would be:
Algorithm 'A'
- Divide the number of elements by two, round down.
- That value is the index of the median.
- i.e. For the following set, the median would be
5
. [4, 5, 6, 7]
This seems to make sense, though the rounding-down aspect seems a bit arbitrary.
Algorithm 'B'
In any case, another colleague has proposed a separate algorithm, which was in a stats textbook of his (need to get the name and author):
- Divide the number of elements by 2, and keep a copy of the rounded-up and rounded-down integers. Name them
n_lo
andn_hi
. - Take the arithmetic mean of the elements at
n_lo
andn_hi
. - i.e. For the following set, the median would be
(5+6)/2 = 5.5
. [4, 5, 6, 7]
This seems wrong though, as the median value, 5.5
in this case, isn't actually in the original data set. When we swapped out algorithm 'A' for 'B' in some test code, it broke horribly (as we expected).
Question
Is there a formal "name" for these two approaches to calculating the median of a data set? i.e. "lesser-of-the-two median" versus "average-the-middle-elements-and-make-new-data median"?
median definition
New contributor
add a comment |Â
up vote
3
down vote
favorite
In my work, when individuals refer to the "mean" value of a data set, they're typically referring to the arithmetic mean (i.e. "average", or "expected value"). If I provided the geometric mean, people would likely think I'm being snide or non-helpful, as the definition of "mean" is known in advance.
I'm trying to determine if there are multiple definitions of the "median" of a data set. For example, one of the definitions provided by a colleague for finding the median of a data set with an even number of elements would be:
Algorithm 'A'
- Divide the number of elements by two, round down.
- That value is the index of the median.
- i.e. For the following set, the median would be
5
. [4, 5, 6, 7]
This seems to make sense, though the rounding-down aspect seems a bit arbitrary.
Algorithm 'B'
In any case, another colleague has proposed a separate algorithm, which was in a stats textbook of his (need to get the name and author):
- Divide the number of elements by 2, and keep a copy of the rounded-up and rounded-down integers. Name them
n_lo
andn_hi
. - Take the arithmetic mean of the elements at
n_lo
andn_hi
. - i.e. For the following set, the median would be
(5+6)/2 = 5.5
. [4, 5, 6, 7]
This seems wrong though, as the median value, 5.5
in this case, isn't actually in the original data set. When we swapped out algorithm 'A' for 'B' in some test code, it broke horribly (as we expected).
Question
Is there a formal "name" for these two approaches to calculating the median of a data set? i.e. "lesser-of-the-two median" versus "average-the-middle-elements-and-make-new-data median"?
median definition
New contributor
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
In my work, when individuals refer to the "mean" value of a data set, they're typically referring to the arithmetic mean (i.e. "average", or "expected value"). If I provided the geometric mean, people would likely think I'm being snide or non-helpful, as the definition of "mean" is known in advance.
I'm trying to determine if there are multiple definitions of the "median" of a data set. For example, one of the definitions provided by a colleague for finding the median of a data set with an even number of elements would be:
Algorithm 'A'
- Divide the number of elements by two, round down.
- That value is the index of the median.
- i.e. For the following set, the median would be
5
. [4, 5, 6, 7]
This seems to make sense, though the rounding-down aspect seems a bit arbitrary.
Algorithm 'B'
In any case, another colleague has proposed a separate algorithm, which was in a stats textbook of his (need to get the name and author):
- Divide the number of elements by 2, and keep a copy of the rounded-up and rounded-down integers. Name them
n_lo
andn_hi
. - Take the arithmetic mean of the elements at
n_lo
andn_hi
. - i.e. For the following set, the median would be
(5+6)/2 = 5.5
. [4, 5, 6, 7]
This seems wrong though, as the median value, 5.5
in this case, isn't actually in the original data set. When we swapped out algorithm 'A' for 'B' in some test code, it broke horribly (as we expected).
Question
Is there a formal "name" for these two approaches to calculating the median of a data set? i.e. "lesser-of-the-two median" versus "average-the-middle-elements-and-make-new-data median"?
median definition
New contributor
In my work, when individuals refer to the "mean" value of a data set, they're typically referring to the arithmetic mean (i.e. "average", or "expected value"). If I provided the geometric mean, people would likely think I'm being snide or non-helpful, as the definition of "mean" is known in advance.
I'm trying to determine if there are multiple definitions of the "median" of a data set. For example, one of the definitions provided by a colleague for finding the median of a data set with an even number of elements would be:
Algorithm 'A'
- Divide the number of elements by two, round down.
- That value is the index of the median.
- i.e. For the following set, the median would be
5
. [4, 5, 6, 7]
This seems to make sense, though the rounding-down aspect seems a bit arbitrary.
Algorithm 'B'
In any case, another colleague has proposed a separate algorithm, which was in a stats textbook of his (need to get the name and author):
- Divide the number of elements by 2, and keep a copy of the rounded-up and rounded-down integers. Name them
n_lo
andn_hi
. - Take the arithmetic mean of the elements at
n_lo
andn_hi
. - i.e. For the following set, the median would be
(5+6)/2 = 5.5
. [4, 5, 6, 7]
This seems wrong though, as the median value, 5.5
in this case, isn't actually in the original data set. When we swapped out algorithm 'A' for 'B' in some test code, it broke horribly (as we expected).
Question
Is there a formal "name" for these two approaches to calculating the median of a data set? i.e. "lesser-of-the-two median" versus "average-the-middle-elements-and-make-new-data median"?
median definition
median definition
New contributor
New contributor
New contributor
asked 43 mins ago
DevNull
1184
1184
New contributor
New contributor
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
4
down vote
accepted
TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.
Medians of random variables are not necessarily unique. Consider this definition from Hogg, McKean and Craig's Mathematical Statistics (which discusses the median of a distribution, not a sample):
A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.
In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,
Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.
The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)
But this seems unnecessarily restrictive; I would prefer to be able to define a median of a random sample for even or odd $n$. Moreover, I would like the median to be unique. Given these two requirements, I have to make some decisions about how to best find a unique sample median. Both Algorithm A and Algorithm B satisfy these requirements. Imposing additional requirements could eliminate either or both from consideration.
Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.
Whether or not a particular estimator breaks unit tests is a property of the unit tests -- unit tests written against a specific estimator won't necessarily hold when you substitute another estimator. In the ideal case, the unit tests were chosen because they reflect the critical needs of your organization, not because of a doctrinaire argument over definitions.
add a comment |Â
up vote
5
down vote
What @Sycorax says.
As a matter of fact, there are suprisingly many definitions of general quantiles, so in particular also of medians. Hyndman & Fan (1996, The American Statistician) give an overview that is, AFAIK, still comprehensive. The different types do not have formal names. You may simply need to be clear on which type you are using. (It often does not make a big difference with data sets of realistic sizes.)
Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:
> median(4:7)
[1] 5.5
R's median()
by default uses type 7 of Hyndman & Fan's classification.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.
Medians of random variables are not necessarily unique. Consider this definition from Hogg, McKean and Craig's Mathematical Statistics (which discusses the median of a distribution, not a sample):
A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.
In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,
Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.
The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)
But this seems unnecessarily restrictive; I would prefer to be able to define a median of a random sample for even or odd $n$. Moreover, I would like the median to be unique. Given these two requirements, I have to make some decisions about how to best find a unique sample median. Both Algorithm A and Algorithm B satisfy these requirements. Imposing additional requirements could eliminate either or both from consideration.
Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.
Whether or not a particular estimator breaks unit tests is a property of the unit tests -- unit tests written against a specific estimator won't necessarily hold when you substitute another estimator. In the ideal case, the unit tests were chosen because they reflect the critical needs of your organization, not because of a doctrinaire argument over definitions.
add a comment |Â
up vote
4
down vote
accepted
TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.
Medians of random variables are not necessarily unique. Consider this definition from Hogg, McKean and Craig's Mathematical Statistics (which discusses the median of a distribution, not a sample):
A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.
In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,
Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.
The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)
But this seems unnecessarily restrictive; I would prefer to be able to define a median of a random sample for even or odd $n$. Moreover, I would like the median to be unique. Given these two requirements, I have to make some decisions about how to best find a unique sample median. Both Algorithm A and Algorithm B satisfy these requirements. Imposing additional requirements could eliminate either or both from consideration.
Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.
Whether or not a particular estimator breaks unit tests is a property of the unit tests -- unit tests written against a specific estimator won't necessarily hold when you substitute another estimator. In the ideal case, the unit tests were chosen because they reflect the critical needs of your organization, not because of a doctrinaire argument over definitions.
add a comment |Â
up vote
4
down vote
accepted
up vote
4
down vote
accepted
TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.
Medians of random variables are not necessarily unique. Consider this definition from Hogg, McKean and Craig's Mathematical Statistics (which discusses the median of a distribution, not a sample):
A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.
In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,
Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.
The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)
But this seems unnecessarily restrictive; I would prefer to be able to define a median of a random sample for even or odd $n$. Moreover, I would like the median to be unique. Given these two requirements, I have to make some decisions about how to best find a unique sample median. Both Algorithm A and Algorithm B satisfy these requirements. Imposing additional requirements could eliminate either or both from consideration.
Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.
Whether or not a particular estimator breaks unit tests is a property of the unit tests -- unit tests written against a specific estimator won't necessarily hold when you substitute another estimator. In the ideal case, the unit tests were chosen because they reflect the critical needs of your organization, not because of a doctrinaire argument over definitions.
TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.
Medians of random variables are not necessarily unique. Consider this definition from Hogg, McKean and Craig's Mathematical Statistics (which discusses the median of a distribution, not a sample):
A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.
In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,
Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.
The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)
But this seems unnecessarily restrictive; I would prefer to be able to define a median of a random sample for even or odd $n$. Moreover, I would like the median to be unique. Given these two requirements, I have to make some decisions about how to best find a unique sample median. Both Algorithm A and Algorithm B satisfy these requirements. Imposing additional requirements could eliminate either or both from consideration.
Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.
Whether or not a particular estimator breaks unit tests is a property of the unit tests -- unit tests written against a specific estimator won't necessarily hold when you substitute another estimator. In the ideal case, the unit tests were chosen because they reflect the critical needs of your organization, not because of a doctrinaire argument over definitions.
edited 15 mins ago
answered 23 mins ago
Sycorax
33.8k587154
33.8k587154
add a comment |Â
add a comment |Â
up vote
5
down vote
What @Sycorax says.
As a matter of fact, there are suprisingly many definitions of general quantiles, so in particular also of medians. Hyndman & Fan (1996, The American Statistician) give an overview that is, AFAIK, still comprehensive. The different types do not have formal names. You may simply need to be clear on which type you are using. (It often does not make a big difference with data sets of realistic sizes.)
Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:
> median(4:7)
[1] 5.5
R's median()
by default uses type 7 of Hyndman & Fan's classification.
add a comment |Â
up vote
5
down vote
What @Sycorax says.
As a matter of fact, there are suprisingly many definitions of general quantiles, so in particular also of medians. Hyndman & Fan (1996, The American Statistician) give an overview that is, AFAIK, still comprehensive. The different types do not have formal names. You may simply need to be clear on which type you are using. (It often does not make a big difference with data sets of realistic sizes.)
Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:
> median(4:7)
[1] 5.5
R's median()
by default uses type 7 of Hyndman & Fan's classification.
add a comment |Â
up vote
5
down vote
up vote
5
down vote
What @Sycorax says.
As a matter of fact, there are suprisingly many definitions of general quantiles, so in particular also of medians. Hyndman & Fan (1996, The American Statistician) give an overview that is, AFAIK, still comprehensive. The different types do not have formal names. You may simply need to be clear on which type you are using. (It often does not make a big difference with data sets of realistic sizes.)
Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:
> median(4:7)
[1] 5.5
R's median()
by default uses type 7 of Hyndman & Fan's classification.
What @Sycorax says.
As a matter of fact, there are suprisingly many definitions of general quantiles, so in particular also of medians. Hyndman & Fan (1996, The American Statistician) give an overview that is, AFAIK, still comprehensive. The different types do not have formal names. You may simply need to be clear on which type you are using. (It often does not make a big difference with data sets of realistic sizes.)
Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:
> median(4:7)
[1] 5.5
R's median()
by default uses type 7 of Hyndman & Fan's classification.
edited 19 mins ago
answered 32 mins ago
Stephan Kolassa
40.7k687150
40.7k687150
add a comment |Â
add a comment |Â
DevNull is a new contributor. Be nice, and check out our Code of Conduct.
DevNull is a new contributor. Be nice, and check out our Code of Conduct.
DevNull is a new contributor. Be nice, and check out our Code of Conduct.
DevNull is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f367467%2fis-there-more-than-one-median-formula%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password