How to get median based on probability distribution?

up vote
1
down vote

favorite

From some calculation, I have a distribution of discrete data $(i,P_i)$ and then want to get the median based on this distribution. Naive way is to create a list

list = Table[Sum[p[[j]], j, i], i, n]

then find the first element greater than $0$:

pos = Position[list, _?(# >= 0.5 &)][[1, 1]]

but this way is pretty slow, especially when I have a very large amount of data. The reason is I have to do Sum everytime. I did several attempts to speed up, such as

list = Table[0, i, n]
For[i = 1, i <= n, i ++, list[[i]] = If[i == 1, p[[i]], list[[i - 1]] + p[[i]]]]

or even smarter by using Accumulate

Accumulate[p]

and then do same Position operation. This made everything much faster and I'm pretty happy with it. I'm wondering whether some similar function is already in Mathematica, so we don't have to manually implement this. However after lookup the Median, I have no results relate to this. Do you guys have any idea?

edited 1 hour ago

J. M. is computer-lessâ™¦

94.6k10294454

asked 1 hour ago

RoderickLee

1636

1

Have you seen EmpiricalDistribution?
â€“Â J. M. is computer-lessâ™¦
1 hour ago

@J.M.iscomputer-less thank you!
â€“Â RoderickLee
40 mins ago

add a commentÂ |Â

up vote
1
down vote

favorite

From some calculation, I have a distribution of discrete data $(i,P_i)$ and then want to get the median based on this distribution. Naive way is to create a list

list = Table[Sum[p[[j]], j, i], i, n]

then find the first element greater than $0$:

pos = Position[list, _?(# >= 0.5 &)][[1, 1]]

but this way is pretty slow, especially when I have a very large amount of data. The reason is I have to do Sum everytime. I did several attempts to speed up, such as

list = Table[0, i, n]
For[i = 1, i <= n, i ++, list[[i]] = If[i == 1, p[[i]], list[[i - 1]] + p[[i]]]]

or even smarter by using Accumulate

Accumulate[p]

edited 1 hour ago

J. M. is computer-lessâ™¦

94.6k10294454

asked 1 hour ago

RoderickLee

1636

1

Have you seen EmpiricalDistribution?
â€“Â J. M. is computer-lessâ™¦
1 hour ago

@J.M.iscomputer-less thank you!
â€“Â RoderickLee
40 mins ago

add a commentÂ |Â

up vote
1
down vote

favorite

From some calculation, I have a distribution of discrete data $(i,P_i)$ and then want to get the median based on this distribution. Naive way is to create a list

list = Table[Sum[p[[j]], j, i], i, n]

then find the first element greater than $0$:

pos = Position[list, _?(# >= 0.5 &)][[1, 1]]

but this way is pretty slow, especially when I have a very large amount of data. The reason is I have to do Sum everytime. I did several attempts to speed up, such as

list = Table[0, i, n]
For[i = 1, i <= n, i ++, list[[i]] = If[i == 1, p[[i]], list[[i - 1]] + p[[i]]]]

or even smarter by using Accumulate

Accumulate[p]

edited 1 hour ago

J. M. is computer-lessâ™¦

94.6k10294454

asked 1 hour ago

RoderickLee

1636

From some calculation, I have a distribution of discrete data $(i,P_i)$ and then want to get the median based on this distribution. Naive way is to create a list

list = Table[Sum[p[[j]], j, i], i, n]

then find the first element greater than $0$:

pos = Position[list, _?(# >= 0.5 &)][[1, 1]]

but this way is pretty slow, especially when I have a very large amount of data. The reason is I have to do Sum everytime. I did several attempts to speed up, such as

list = Table[0, i, n]
For[i = 1, i <= n, i ++, list[[i]] = If[i == 1, p[[i]], list[[i - 1]] + p[[i]]]]

or even smarter by using Accumulate

Accumulate[p]

list-manipulation probability-or-statistics

edited 1 hour ago

J. M. is computer-lessâ™¦

94.6k10294454

asked 1 hour ago

RoderickLee

1636

edited 1 hour ago

J. M. is computer-lessâ™¦

94.6k10294454

asked 1 hour ago

RoderickLee

1636

edited 1 hour ago

J. M. is computer-lessâ™¦

94.6k10294454

edited 1 hour ago

J. M. is computer-lessâ™¦

94.6k10294454

edited 1 hour ago

J. M. is computer-lessâ™¦

94.6k10294454

asked 1 hour ago

RoderickLee

1636

asked 1 hour ago

RoderickLee

1636

asked 1 hour ago

RoderickLee

1636

1

Have you seen EmpiricalDistribution?
â€“Â J. M. is computer-lessâ™¦
1 hour ago

@J.M.iscomputer-less thank you!
â€“Â RoderickLee
40 mins ago

add a commentÂ |Â

1

Have you seen EmpiricalDistribution?
â€“Â J. M. is computer-lessâ™¦
1 hour ago

@J.M.iscomputer-less thank you!
â€“Â RoderickLee
40 mins ago

Have you seen EmpiricalDistribution?
â€“Â J. M. is computer-lessâ™¦
1 hour ago

@J.M.iscomputer-less thank you!
â€“Â RoderickLee
40 mins ago

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
3
down vote

accepted

To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution.

First let's get an example table of pairs of value,probability like you showed in your question

list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]

Then we make this into a EmpiricalDistribution:

dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]

CDF plot

Here we use the syntax where we have used the probabilities as weights to single sample examples to get the right distribution. If you have your original data sample before binning that's even better as an input and EmpiricalDistribution will do the binning for you.

Now we can easily get the median by calling Median on our distribution:

Median[dist]

answered 46 mins ago

Thies Heidecke

6,3212438

1

Thanks for following through. ;) One thing: you could have used Normalize[RandomReal[0, 1, 10], Total] as well.
â€“Â J. M. is computer-lessâ™¦
44 mins ago

@J.M. Ah, cool idea! Just thought about #/Norm[#,1]&, but haven't thought about using Normalize in that way!
â€“Â Thies Heidecke
33 mins ago

add a commentÂ |Â

up vote
1
down vote

You can also use WeightedData as follows:

SeedRandom[1]
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
wd = WeightedData @@ Transpose[list];

You can use Median or Quantile with wd

Median[wd]

Quantile[wd, 1/2]

Alternatively,

Median[EmpiricalDistribution[wd]]
Quantile[EmpiricalDistribution[wd], 1/2]
InverseCDF[EmpiricalDistribution[wd], 1/2]

answered 39 mins ago

kglr

165k8188388

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f183930%2fhow-to-get-median-based-on-probability-distribution%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
3
down vote

accepted

To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution.

First let's get an example table of pairs of value,probability like you showed in your question

list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]

Then we make this into a EmpiricalDistribution:

dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]

CDF plot

Now we can easily get the median by calling Median on our distribution:

Median[dist]

answered 46 mins ago

Thies Heidecke

6,3212438

1

Thanks for following through. ;) One thing: you could have used Normalize[RandomReal[0, 1, 10], Total] as well.
â€“Â J. M. is computer-lessâ™¦
44 mins ago

@J.M. Ah, cool idea! Just thought about #/Norm[#,1]&, but haven't thought about using Normalize in that way!
â€“Â Thies Heidecke
33 mins ago

add a commentÂ |Â

up vote
3
down vote

accepted

To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution.

First let's get an example table of pairs of value,probability like you showed in your question

list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]

Then we make this into a EmpiricalDistribution:

dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]

CDF plot

Now we can easily get the median by calling Median on our distribution:

Median[dist]

answered 46 mins ago

Thies Heidecke

6,3212438

1

Thanks for following through. ;) One thing: you could have used Normalize[RandomReal[0, 1, 10], Total] as well.
â€“Â J. M. is computer-lessâ™¦
44 mins ago

@J.M. Ah, cool idea! Just thought about #/Norm[#,1]&, but haven't thought about using Normalize in that way!
â€“Â Thies Heidecke
33 mins ago

add a commentÂ |Â

up vote
3
down vote

accepted

To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution.

First let's get an example table of pairs of value,probability like you showed in your question

list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]

Then we make this into a EmpiricalDistribution:

dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]

CDF plot

Now we can easily get the median by calling Median on our distribution:

Median[dist]

answered 46 mins ago

Thies Heidecke

6,3212438

To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution.

First let's get an example table of pairs of value,probability like you showed in your question

list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]

Then we make this into a EmpiricalDistribution:

dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]

CDF plot

Now we can easily get the median by calling Median on our distribution:

Median[dist]

answered 46 mins ago

Thies Heidecke

6,3212438

answered 46 mins ago

Thies Heidecke

6,3212438

answered 46 mins ago

Thies Heidecke

6,3212438

answered 46 mins ago

Thies Heidecke

6,3212438

1

Thanks for following through. ;) One thing: you could have used Normalize[RandomReal[0, 1, 10], Total] as well.
â€“Â J. M. is computer-lessâ™¦
44 mins ago

@J.M. Ah, cool idea! Just thought about #/Norm[#,1]&, but haven't thought about using Normalize in that way!
â€“Â Thies Heidecke
33 mins ago

add a commentÂ |Â

1

Thanks for following through. ;) One thing: you could have used Normalize[RandomReal[0, 1, 10], Total] as well.
â€“Â J. M. is computer-lessâ™¦
44 mins ago

@J.M. Ah, cool idea! Just thought about #/Norm[#,1]&, but haven't thought about using Normalize in that way!
â€“Â Thies Heidecke
33 mins ago

Thanks for following through. ;) One thing: you could have used Normalize[RandomReal[0, 1, 10], Total] as well.
â€“Â J. M. is computer-lessâ™¦
44 mins ago

@J.M. Ah, cool idea! Just thought about #/Norm[#,1]&, but haven't thought about using Normalize in that way!
â€“Â Thies Heidecke
33 mins ago

add a commentÂ |Â

up vote
1
down vote

You can also use WeightedData as follows:

SeedRandom[1]
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
wd = WeightedData @@ Transpose[list];

You can use Median or Quantile with wd

Median[wd]

Quantile[wd, 1/2]

Alternatively,

Median[EmpiricalDistribution[wd]]
Quantile[EmpiricalDistribution[wd], 1/2]
InverseCDF[EmpiricalDistribution[wd], 1/2]

answered 39 mins ago

kglr

165k8188388

add a commentÂ |Â

up vote
1
down vote

You can also use WeightedData as follows:

SeedRandom[1]
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
wd = WeightedData @@ Transpose[list];

You can use Median or Quantile with wd

Median[wd]

Quantile[wd, 1/2]

Alternatively,

Median[EmpiricalDistribution[wd]]
Quantile[EmpiricalDistribution[wd], 1/2]
InverseCDF[EmpiricalDistribution[wd], 1/2]

answered 39 mins ago

kglr

165k8188388

add a commentÂ |Â

up vote
1
down vote

You can also use WeightedData as follows:

SeedRandom[1]
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
wd = WeightedData @@ Transpose[list];

You can use Median or Quantile with wd

Median[wd]

Quantile[wd, 1/2]

Alternatively,

Median[EmpiricalDistribution[wd]]
Quantile[EmpiricalDistribution[wd], 1/2]
InverseCDF[EmpiricalDistribution[wd], 1/2]

answered 39 mins ago

kglr

165k8188388

You can also use WeightedData as follows:

SeedRandom[1]
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
wd = WeightedData @@ Transpose[list];

You can use Median or Quantile with wd

Median[wd]

Quantile[wd, 1/2]

Alternatively,

Median[EmpiricalDistribution[wd]]
Quantile[EmpiricalDistribution[wd], 1/2]
InverseCDF[EmpiricalDistribution[wd], 1/2]

answered 39 mins ago

kglr

165k8188388

answered 39 mins ago

kglr

165k8188388

answered 39 mins ago

kglr

165k8188388

answered 39 mins ago

kglr

165k8188388

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky