How to get median based on probability distribution?
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
From some calculation, I have a distribution of discrete data $(i,P_i)$ and then want to get the median based on this distribution. Naive way is to create a list
list = Table[Sum[p[[j]], j, i], i, n]
then find the first element greater than $0$:
pos = Position[list, _?(# >= 0.5 &)][[1, 1]]
but this way is pretty slow, especially when I have a very large amount of data. The reason is I have to do Sum everytime. I did several attempts to speed up, such as
list = Table[0, i, n]
For[i = 1, i <= n, i ++, list[[i]] = If[i == 1, p[[i]], list[[i - 1]] + p[[i]]]]
or even smarter by using Accumulate
Accumulate[p]
and then do same Position operation. This made everything much faster and I'm pretty happy with it. I'm wondering whether some similar function is already in Mathematica, so we don't have to manually implement this. However after lookup the Median, I have no results relate to this. Do you guys have any idea?
list-manipulation probability-or-statistics
add a comment |Â
up vote
1
down vote
favorite
From some calculation, I have a distribution of discrete data $(i,P_i)$ and then want to get the median based on this distribution. Naive way is to create a list
list = Table[Sum[p[[j]], j, i], i, n]
then find the first element greater than $0$:
pos = Position[list, _?(# >= 0.5 &)][[1, 1]]
but this way is pretty slow, especially when I have a very large amount of data. The reason is I have to do Sum everytime. I did several attempts to speed up, such as
list = Table[0, i, n]
For[i = 1, i <= n, i ++, list[[i]] = If[i == 1, p[[i]], list[[i - 1]] + p[[i]]]]
or even smarter by using Accumulate
Accumulate[p]
and then do same Position operation. This made everything much faster and I'm pretty happy with it. I'm wondering whether some similar function is already in Mathematica, so we don't have to manually implement this. However after lookup the Median, I have no results relate to this. Do you guys have any idea?
list-manipulation probability-or-statistics
1
Have you seenEmpiricalDistribution
?
â J. M. is computer-lessâ¦
1 hour ago
@J.M.iscomputer-less thank you!
â RoderickLee
40 mins ago
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
From some calculation, I have a distribution of discrete data $(i,P_i)$ and then want to get the median based on this distribution. Naive way is to create a list
list = Table[Sum[p[[j]], j, i], i, n]
then find the first element greater than $0$:
pos = Position[list, _?(# >= 0.5 &)][[1, 1]]
but this way is pretty slow, especially when I have a very large amount of data. The reason is I have to do Sum everytime. I did several attempts to speed up, such as
list = Table[0, i, n]
For[i = 1, i <= n, i ++, list[[i]] = If[i == 1, p[[i]], list[[i - 1]] + p[[i]]]]
or even smarter by using Accumulate
Accumulate[p]
and then do same Position operation. This made everything much faster and I'm pretty happy with it. I'm wondering whether some similar function is already in Mathematica, so we don't have to manually implement this. However after lookup the Median, I have no results relate to this. Do you guys have any idea?
list-manipulation probability-or-statistics
From some calculation, I have a distribution of discrete data $(i,P_i)$ and then want to get the median based on this distribution. Naive way is to create a list
list = Table[Sum[p[[j]], j, i], i, n]
then find the first element greater than $0$:
pos = Position[list, _?(# >= 0.5 &)][[1, 1]]
but this way is pretty slow, especially when I have a very large amount of data. The reason is I have to do Sum everytime. I did several attempts to speed up, such as
list = Table[0, i, n]
For[i = 1, i <= n, i ++, list[[i]] = If[i == 1, p[[i]], list[[i - 1]] + p[[i]]]]
or even smarter by using Accumulate
Accumulate[p]
and then do same Position operation. This made everything much faster and I'm pretty happy with it. I'm wondering whether some similar function is already in Mathematica, so we don't have to manually implement this. However after lookup the Median, I have no results relate to this. Do you guys have any idea?
list-manipulation probability-or-statistics
list-manipulation probability-or-statistics
edited 1 hour ago
J. M. is computer-lessâ¦
94.6k10294454
94.6k10294454
asked 1 hour ago
RoderickLee
1636
1636
1
Have you seenEmpiricalDistribution
?
â J. M. is computer-lessâ¦
1 hour ago
@J.M.iscomputer-less thank you!
â RoderickLee
40 mins ago
add a comment |Â
1
Have you seenEmpiricalDistribution
?
â J. M. is computer-lessâ¦
1 hour ago
@J.M.iscomputer-less thank you!
â RoderickLee
40 mins ago
1
1
Have you seen
EmpiricalDistribution
?â J. M. is computer-lessâ¦
1 hour ago
Have you seen
EmpiricalDistribution
?â J. M. is computer-lessâ¦
1 hour ago
@J.M.iscomputer-less thank you!
â RoderickLee
40 mins ago
@J.M.iscomputer-less thank you!
â RoderickLee
40 mins ago
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
3
down vote
accepted
To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution
.
First let's get an example table of pairs of value,probability
like you showed in your question
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]
1, 0.0538923, 2, 0.00538521, 3, 0.158895, 4, 0.10697, 5,
0.0799713, 6, 0.17624, 7, 0.112601, 8, 0.128191, 9,
0.156779, 10, 0.0210756
Then we make this into a EmpiricalDistribution
:
dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]
Here we use the syntax where we have used the probabilities as weights to single sample examples to get the right distribution. If you have your original data sample before binning that's even better as an input and EmpiricalDistribution
will do the binning for you.
Now we can easily get the median by calling Median
on our distribution:
Median[dist]
6
1
Thanks for following through. ;) One thing: you could have usedNormalize[RandomReal[0, 1, 10], Total]
as well.
â J. M. is computer-lessâ¦
44 mins ago
@J.M. Ah, cool idea! Just thought about#/Norm[#,1]&
, but haven't thought about using Normalize in that way!
â Thies Heidecke
33 mins ago
add a comment |Â
up vote
1
down vote
You can also use WeightedData
as follows:
SeedRandom[1]
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
wd = WeightedData @@ Transpose[list];
You can use Median
or Quantile
with wd
Median[wd]
5
Quantile[wd, 1/2]
5
Alternatively,
Median[EmpiricalDistribution[wd]]
Quantile[EmpiricalDistribution[wd], 1/2]
InverseCDF[EmpiricalDistribution[wd], 1/2]
5
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution
.
First let's get an example table of pairs of value,probability
like you showed in your question
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]
1, 0.0538923, 2, 0.00538521, 3, 0.158895, 4, 0.10697, 5,
0.0799713, 6, 0.17624, 7, 0.112601, 8, 0.128191, 9,
0.156779, 10, 0.0210756
Then we make this into a EmpiricalDistribution
:
dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]
Here we use the syntax where we have used the probabilities as weights to single sample examples to get the right distribution. If you have your original data sample before binning that's even better as an input and EmpiricalDistribution
will do the binning for you.
Now we can easily get the median by calling Median
on our distribution:
Median[dist]
6
1
Thanks for following through. ;) One thing: you could have usedNormalize[RandomReal[0, 1, 10], Total]
as well.
â J. M. is computer-lessâ¦
44 mins ago
@J.M. Ah, cool idea! Just thought about#/Norm[#,1]&
, but haven't thought about using Normalize in that way!
â Thies Heidecke
33 mins ago
add a comment |Â
up vote
3
down vote
accepted
To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution
.
First let's get an example table of pairs of value,probability
like you showed in your question
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]
1, 0.0538923, 2, 0.00538521, 3, 0.158895, 4, 0.10697, 5,
0.0799713, 6, 0.17624, 7, 0.112601, 8, 0.128191, 9,
0.156779, 10, 0.0210756
Then we make this into a EmpiricalDistribution
:
dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]
Here we use the syntax where we have used the probabilities as weights to single sample examples to get the right distribution. If you have your original data sample before binning that's even better as an input and EmpiricalDistribution
will do the binning for you.
Now we can easily get the median by calling Median
on our distribution:
Median[dist]
6
1
Thanks for following through. ;) One thing: you could have usedNormalize[RandomReal[0, 1, 10], Total]
as well.
â J. M. is computer-lessâ¦
44 mins ago
@J.M. Ah, cool idea! Just thought about#/Norm[#,1]&
, but haven't thought about using Normalize in that way!
â Thies Heidecke
33 mins ago
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution
.
First let's get an example table of pairs of value,probability
like you showed in your question
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]
1, 0.0538923, 2, 0.00538521, 3, 0.158895, 4, 0.10697, 5,
0.0799713, 6, 0.17624, 7, 0.112601, 8, 0.128191, 9,
0.156779, 10, 0.0210756
Then we make this into a EmpiricalDistribution
:
dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]
Here we use the syntax where we have used the probabilities as weights to single sample examples to get the right distribution. If you have your original data sample before binning that's even better as an input and EmpiricalDistribution
will do the binning for you.
Now we can easily get the median by calling Median
on our distribution:
Median[dist]
6
To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution
.
First let's get an example table of pairs of value,probability
like you showed in your question
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]
1, 0.0538923, 2, 0.00538521, 3, 0.158895, 4, 0.10697, 5,
0.0799713, 6, 0.17624, 7, 0.112601, 8, 0.128191, 9,
0.156779, 10, 0.0210756
Then we make this into a EmpiricalDistribution
:
dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]
Here we use the syntax where we have used the probabilities as weights to single sample examples to get the right distribution. If you have your original data sample before binning that's even better as an input and EmpiricalDistribution
will do the binning for you.
Now we can easily get the median by calling Median
on our distribution:
Median[dist]
6
answered 46 mins ago
Thies Heidecke
6,3212438
6,3212438
1
Thanks for following through. ;) One thing: you could have usedNormalize[RandomReal[0, 1, 10], Total]
as well.
â J. M. is computer-lessâ¦
44 mins ago
@J.M. Ah, cool idea! Just thought about#/Norm[#,1]&
, but haven't thought about using Normalize in that way!
â Thies Heidecke
33 mins ago
add a comment |Â
1
Thanks for following through. ;) One thing: you could have usedNormalize[RandomReal[0, 1, 10], Total]
as well.
â J. M. is computer-lessâ¦
44 mins ago
@J.M. Ah, cool idea! Just thought about#/Norm[#,1]&
, but haven't thought about using Normalize in that way!
â Thies Heidecke
33 mins ago
1
1
Thanks for following through. ;) One thing: you could have used
Normalize[RandomReal[0, 1, 10], Total]
as well.â J. M. is computer-lessâ¦
44 mins ago
Thanks for following through. ;) One thing: you could have used
Normalize[RandomReal[0, 1, 10], Total]
as well.â J. M. is computer-lessâ¦
44 mins ago
@J.M. Ah, cool idea! Just thought about
#/Norm[#,1]&
, but haven't thought about using Normalize in that way!â Thies Heidecke
33 mins ago
@J.M. Ah, cool idea! Just thought about
#/Norm[#,1]&
, but haven't thought about using Normalize in that way!â Thies Heidecke
33 mins ago
add a comment |Â
up vote
1
down vote
You can also use WeightedData
as follows:
SeedRandom[1]
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
wd = WeightedData @@ Transpose[list];
You can use Median
or Quantile
with wd
Median[wd]
5
Quantile[wd, 1/2]
5
Alternatively,
Median[EmpiricalDistribution[wd]]
Quantile[EmpiricalDistribution[wd], 1/2]
InverseCDF[EmpiricalDistribution[wd], 1/2]
5
add a comment |Â
up vote
1
down vote
You can also use WeightedData
as follows:
SeedRandom[1]
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
wd = WeightedData @@ Transpose[list];
You can use Median
or Quantile
with wd
Median[wd]
5
Quantile[wd, 1/2]
5
Alternatively,
Median[EmpiricalDistribution[wd]]
Quantile[EmpiricalDistribution[wd], 1/2]
InverseCDF[EmpiricalDistribution[wd], 1/2]
5
add a comment |Â
up vote
1
down vote
up vote
1
down vote
You can also use WeightedData
as follows:
SeedRandom[1]
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
wd = WeightedData @@ Transpose[list];
You can use Median
or Quantile
with wd
Median[wd]
5
Quantile[wd, 1/2]
5
Alternatively,
Median[EmpiricalDistribution[wd]]
Quantile[EmpiricalDistribution[wd], 1/2]
InverseCDF[EmpiricalDistribution[wd], 1/2]
5
You can also use WeightedData
as follows:
SeedRandom[1]
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
wd = WeightedData @@ Transpose[list];
You can use Median
or Quantile
with wd
Median[wd]
5
Quantile[wd, 1/2]
5
Alternatively,
Median[EmpiricalDistribution[wd]]
Quantile[EmpiricalDistribution[wd], 1/2]
InverseCDF[EmpiricalDistribution[wd], 1/2]
5
answered 39 mins ago
kglr
165k8188388
165k8188388
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f183930%2fhow-to-get-median-based-on-probability-distribution%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Have you seen
EmpiricalDistribution
?â J. M. is computer-lessâ¦
1 hour ago
@J.M.iscomputer-less thank you!
â RoderickLee
40 mins ago