.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
3
down vote

favorite

In my work, when individuals refer to the "mean" value of a data set, they're typically referring to the arithmetic mean (i.e. "average", or "expected value"). If I provided the geometric mean, people would likely think I'm being snide or non-helpful, as the definition of "mean" is known in advance.

I'm trying to determine if there are multiple definitions of the "median" of a data set. For example, one of the definitions provided by a colleague for finding the median of a data set with an even number of elements would be:

Algorithm 'A'

Divide the number of elements by two, round down.

That value is the index of the median.

i.e. For the following set, the median would be 5.

[4, 5, 6, 7]

This seems to make sense, though the rounding-down aspect seems a bit arbitrary.

Algorithm 'B'

In any case, another colleague has proposed a separate algorithm, which was in a stats textbook of his (need to get the name and author):

Divide the number of elements by 2, and keep a copy of the rounded-up and rounded-down integers. Name them n_lo and n_hi.

Take the arithmetic mean of the elements at n_lo and n_hi.

i.e. For the following set, the median would be (5+6)/2 = 5.5.

[4, 5, 6, 7]

This seems wrong though, as the median value, 5.5 in this case, isn't actually in the original data set. When we swapped out algorithm 'A' for 'B' in some test code, it broke horribly (as we expected).

Question

Is there a formal "name" for these two approaches to calculating the median of a data set? i.e. "lesser-of-the-two median" versus "average-the-middle-elements-and-make-new-data median"?

asked 43 mins ago

DevNull

1184

New contributor

add a commentÂ |Â

up vote
3
down vote

favorite

Algorithm 'A'

Divide the number of elements by two, round down.

That value is the index of the median.

i.e. For the following set, the median would be 5.

[4, 5, 6, 7]

This seems to make sense, though the rounding-down aspect seems a bit arbitrary.

Algorithm 'B'

In any case, another colleague has proposed a separate algorithm, which was in a stats textbook of his (need to get the name and author):

Divide the number of elements by 2, and keep a copy of the rounded-up and rounded-down integers. Name them n_lo and n_hi.

Take the arithmetic mean of the elements at n_lo and n_hi.

i.e. For the following set, the median would be (5+6)/2 = 5.5.

[4, 5, 6, 7]

Question

Is there a formal "name" for these two approaches to calculating the median of a data set? i.e. "lesser-of-the-two median" versus "average-the-middle-elements-and-make-new-data median"?

asked 43 mins ago

DevNull

1184

New contributor

add a commentÂ |Â

up vote
3
down vote

favorite

Algorithm 'A'

Divide the number of elements by two, round down.

That value is the index of the median.

i.e. For the following set, the median would be 5.

[4, 5, 6, 7]

This seems to make sense, though the rounding-down aspect seems a bit arbitrary.

Algorithm 'B'

In any case, another colleague has proposed a separate algorithm, which was in a stats textbook of his (need to get the name and author):

Divide the number of elements by 2, and keep a copy of the rounded-up and rounded-down integers. Name them n_lo and n_hi.

Take the arithmetic mean of the elements at n_lo and n_hi.

i.e. For the following set, the median would be (5+6)/2 = 5.5.

[4, 5, 6, 7]

Question

Is there a formal "name" for these two approaches to calculating the median of a data set? i.e. "lesser-of-the-two median" versus "average-the-middle-elements-and-make-new-data median"?

asked 43 mins ago

DevNull

1184

New contributor

Algorithm 'A'

Divide the number of elements by two, round down.

That value is the index of the median.

i.e. For the following set, the median would be 5.

[4, 5, 6, 7]

This seems to make sense, though the rounding-down aspect seems a bit arbitrary.

Algorithm 'B'

In any case, another colleague has proposed a separate algorithm, which was in a stats textbook of his (need to get the name and author):

Divide the number of elements by 2, and keep a copy of the rounded-up and rounded-down integers. Name them n_lo and n_hi.

Take the arithmetic mean of the elements at n_lo and n_hi.

i.e. For the following set, the median would be (5+6)/2 = 5.5.

[4, 5, 6, 7]

Question

Is there a formal "name" for these two approaches to calculating the median of a data set? i.e. "lesser-of-the-two median" versus "average-the-middle-elements-and-make-new-data median"?

median definition

asked 43 mins ago

DevNull

1184

New contributor

asked 43 mins ago

DevNull

1184

New contributor

asked 43 mins ago

DevNull

1184

New contributor

asked 43 mins ago

DevNull

1184

asked 43 mins ago

DevNull

1184

New contributor

DevNull is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
4
down vote

accepted

TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.

Medians of random variables are not necessarily unique. Consider this definition from Hogg, McKean and Craig's Mathematical Statistics (which discusses the median of a distribution, not a sample):

A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.

In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,

Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.

The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)

But this seems unnecessarily restrictive; I would prefer to be able to define a median of a random sample for even or odd $n$. Moreover, I would like the median to be unique. Given these two requirements, I have to make some decisions about how to best find a unique sample median. Both Algorithm A and Algorithm B satisfy these requirements. Imposing additional requirements could eliminate either or both from consideration.

Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.

Whether or not a particular estimator breaks unit tests is a property of the unit tests -- unit tests written against a specific estimator won't necessarily hold when you substitute another estimator. In the ideal case, the unit tests were chosen because they reflect the critical needs of your organization, not because of a doctrinaire argument over definitions.

edited 15 mins ago

answered 23 mins ago

Sycorax

33.8k587154

add a commentÂ |Â

up vote
5
down vote

What @Sycorax says.

As a matter of fact, there are suprisingly many definitions of general quantiles, so in particular also of medians. Hyndman & Fan (1996, The American Statistician) give an overview that is, AFAIK, still comprehensive. The different types do not have formal names. You may simply need to be clear on which type you are using. (It often does not make a big difference with data sets of realistic sizes.)

Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:

> median(4:7)
[1] 5.5

R's median() by default uses type 7 of Hyndman & Fan's classification.

edited 19 mins ago

answered 32 mins ago

Stephan Kolassa

40.7k687150

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

DevNull is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f367467%2fis-there-more-than-one-median-formula%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
4
down vote

accepted

TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.

A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.

In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,

Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.

The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)

Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.

edited 15 mins ago

answered 23 mins ago

Sycorax

33.8k587154

add a commentÂ |Â

up vote
4
down vote

accepted

TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.

A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.

In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,

Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.

The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)

Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.

edited 15 mins ago

answered 23 mins ago

Sycorax

33.8k587154

add a commentÂ |Â

up vote
4
down vote

accepted

TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.

A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.

In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,

Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.

The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)

Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.

edited 15 mins ago

answered 23 mins ago

Sycorax

33.8k587154

TL;DR - I'm not aware of specific names being given to different estimators of sample medians. Order statistics are, themselves, rather fussy and different resources give different definitions.

A median of a distribution of one random variable $X$ of the discrete or continuous type is a value of $x$ such that $P(X<x) le frac12$ and $P(X < x) ge frac12$. If there is only one such $x$, it is called the median of the distribution.

In the same reference, the authors provide a definition of medians of random samples, but only in the case that there are an odd number of samples! In Hogg McKean and Craig again,

Certain functions of the order statistics are important statistics themselves... if $n$ is odd, $Y_(n+1)/2$ ... is called the median of the random sample.

The authors provide no guidance on what to do if you have an even number of samples. (Note that $Y_i$ is the $i$th smallest datum.)

Algorithm B has the property that half the data fall above the value, and half the data fall below the value. In light of the definition of the median of a random variable, this seems nice.

edited 15 mins ago

answered 23 mins ago

Sycorax

33.8k587154

edited 15 mins ago

answered 23 mins ago

Sycorax

33.8k587154

answered 23 mins ago

Sycorax

33.8k587154

answered 23 mins ago

Sycorax

33.8k587154

add a commentÂ |Â

up vote
5
down vote

What @Sycorax says.

Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:

> median(4:7)
[1] 5.5

R's median() by default uses type 7 of Hyndman & Fan's classification.

edited 19 mins ago

answered 32 mins ago

Stephan Kolassa

40.7k687150

add a commentÂ |Â

up vote
5
down vote

What @Sycorax says.

Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:

> median(4:7)
[1] 5.5

R's median() by default uses type 7 of Hyndman & Fan's classification.

edited 19 mins ago

answered 32 mins ago

Stephan Kolassa

40.7k687150

add a commentÂ |Â

up vote
5
down vote

What @Sycorax says.

Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:

> median(4:7)
[1] 5.5

R's median() by default uses type 7 of Hyndman & Fan's classification.

edited 19 mins ago

answered 32 mins ago

Stephan Kolassa

40.7k687150

What @Sycorax says.

Note that it is commonly accepted to have a value that is not present in the data set as the median, e.g., 5.5 as a median for (4, 5, 6, 7). This is the default behavior for R:

> median(4:7)
[1] 5.5

R's median() by default uses type 7 of Hyndman & Fan's classification.

edited 19 mins ago

answered 32 mins ago

Stephan Kolassa

40.7k687150

edited 19 mins ago

answered 32 mins ago

Stephan Kolassa

40.7k687150

answered 32 mins ago

Stephan Kolassa

40.7k687150

answered 32 mins ago

Stephan Kolassa

40.7k687150

add a commentÂ |Â

DevNull is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

DevNull is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky

Is there more than one â€œmedianâ€ formula?

Algorithm 'A'

Algorithm 'B'

Question

Algorithm 'A'

Algorithm 'B'

Question

Algorithm 'A'

Algorithm 'B'

Question

Algorithm 'A'

Algorithm 'B'

Question

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Comments

Post a Comment

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery

Category

Random preview

Is there more than one â€œmedianâ€ formula?

Algorithm 'A'

Algorithm 'B'

Question

Algorithm 'A'

Algorithm 'B'

Question

Algorithm 'A'

Algorithm 'B'

Question

Algorithm 'A'

Algorithm 'B'

Question

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Comments

Post a Comment

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery

Is there more than one â€œmedianâ€ formula?

2 Answers
2

2 Answers
2

2 Answers
2