How to get median based on probability distribution?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












From some calculation, I have a distribution of discrete data $(i,P_i)$ and then want to get the median based on this distribution. Naive way is to create a list



list = Table[Sum[p[[j]], j, i], i, n]


then find the first element greater than $0$:



pos = Position[list, _?(# >= 0.5 &)][[1, 1]]


but this way is pretty slow, especially when I have a very large amount of data. The reason is I have to do Sum everytime. I did several attempts to speed up, such as



list = Table[0, i, n]
For[i = 1, i <= n, i ++, list[[i]] = If[i == 1, p[[i]], list[[i - 1]] + p[[i]]]]


or even smarter by using Accumulate



Accumulate[p]


and then do same Position operation. This made everything much faster and I'm pretty happy with it. I'm wondering whether some similar function is already in Mathematica, so we don't have to manually implement this. However after lookup the Median, I have no results relate to this. Do you guys have any idea?










share|improve this question



















  • 1




    Have you seen EmpiricalDistribution?
    – J. M. is computer-less♦
    1 hour ago










  • @J.M.iscomputer-less thank you!
    – RoderickLee
    40 mins ago














up vote
1
down vote

favorite












From some calculation, I have a distribution of discrete data $(i,P_i)$ and then want to get the median based on this distribution. Naive way is to create a list



list = Table[Sum[p[[j]], j, i], i, n]


then find the first element greater than $0$:



pos = Position[list, _?(# >= 0.5 &)][[1, 1]]


but this way is pretty slow, especially when I have a very large amount of data. The reason is I have to do Sum everytime. I did several attempts to speed up, such as



list = Table[0, i, n]
For[i = 1, i <= n, i ++, list[[i]] = If[i == 1, p[[i]], list[[i - 1]] + p[[i]]]]


or even smarter by using Accumulate



Accumulate[p]


and then do same Position operation. This made everything much faster and I'm pretty happy with it. I'm wondering whether some similar function is already in Mathematica, so we don't have to manually implement this. However after lookup the Median, I have no results relate to this. Do you guys have any idea?










share|improve this question



















  • 1




    Have you seen EmpiricalDistribution?
    – J. M. is computer-less♦
    1 hour ago










  • @J.M.iscomputer-less thank you!
    – RoderickLee
    40 mins ago












up vote
1
down vote

favorite









up vote
1
down vote

favorite











From some calculation, I have a distribution of discrete data $(i,P_i)$ and then want to get the median based on this distribution. Naive way is to create a list



list = Table[Sum[p[[j]], j, i], i, n]


then find the first element greater than $0$:



pos = Position[list, _?(# >= 0.5 &)][[1, 1]]


but this way is pretty slow, especially when I have a very large amount of data. The reason is I have to do Sum everytime. I did several attempts to speed up, such as



list = Table[0, i, n]
For[i = 1, i <= n, i ++, list[[i]] = If[i == 1, p[[i]], list[[i - 1]] + p[[i]]]]


or even smarter by using Accumulate



Accumulate[p]


and then do same Position operation. This made everything much faster and I'm pretty happy with it. I'm wondering whether some similar function is already in Mathematica, so we don't have to manually implement this. However after lookup the Median, I have no results relate to this. Do you guys have any idea?










share|improve this question















From some calculation, I have a distribution of discrete data $(i,P_i)$ and then want to get the median based on this distribution. Naive way is to create a list



list = Table[Sum[p[[j]], j, i], i, n]


then find the first element greater than $0$:



pos = Position[list, _?(# >= 0.5 &)][[1, 1]]


but this way is pretty slow, especially when I have a very large amount of data. The reason is I have to do Sum everytime. I did several attempts to speed up, such as



list = Table[0, i, n]
For[i = 1, i <= n, i ++, list[[i]] = If[i == 1, p[[i]], list[[i - 1]] + p[[i]]]]


or even smarter by using Accumulate



Accumulate[p]


and then do same Position operation. This made everything much faster and I'm pretty happy with it. I'm wondering whether some similar function is already in Mathematica, so we don't have to manually implement this. However after lookup the Median, I have no results relate to this. Do you guys have any idea?







list-manipulation probability-or-statistics






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 1 hour ago









J. M. is computer-less♦

94.6k10294454




94.6k10294454










asked 1 hour ago









RoderickLee

1636




1636







  • 1




    Have you seen EmpiricalDistribution?
    – J. M. is computer-less♦
    1 hour ago










  • @J.M.iscomputer-less thank you!
    – RoderickLee
    40 mins ago












  • 1




    Have you seen EmpiricalDistribution?
    – J. M. is computer-less♦
    1 hour ago










  • @J.M.iscomputer-less thank you!
    – RoderickLee
    40 mins ago







1




1




Have you seen EmpiricalDistribution?
– J. M. is computer-less♦
1 hour ago




Have you seen EmpiricalDistribution?
– J. M. is computer-less♦
1 hour ago












@J.M.iscomputer-less thank you!
– RoderickLee
40 mins ago




@J.M.iscomputer-less thank you!
– RoderickLee
40 mins ago










2 Answers
2






active

oldest

votes

















up vote
3
down vote



accepted










To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution.



First let's get an example table of pairs of value,probability like you showed in your question



list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]



1, 0.0538923, 2, 0.00538521, 3, 0.158895, 4, 0.10697, 5,
0.0799713, 6, 0.17624, 7, 0.112601, 8, 0.128191, 9,
0.156779, 10, 0.0210756




Then we make this into a EmpiricalDistribution:



dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]


CDF plot



Here we use the syntax where we have used the probabilities as weights to single sample examples to get the right distribution. If you have your original data sample before binning that's even better as an input and EmpiricalDistribution will do the binning for you.



Now we can easily get the median by calling Median on our distribution:



Median[dist]



6







share|improve this answer
















  • 1




    Thanks for following through. ;) One thing: you could have used Normalize[RandomReal[0, 1, 10], Total] as well.
    – J. M. is computer-less♦
    44 mins ago











  • @J.M. Ah, cool idea! Just thought about #/Norm[#,1]&, but haven't thought about using Normalize in that way!
    – Thies Heidecke
    33 mins ago

















up vote
1
down vote













You can also use WeightedData as follows:



SeedRandom[1]
list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
wd = WeightedData @@ Transpose[list];


You can use Median or Quantile with wd



Median[wd]



5




Quantile[wd, 1/2]



5




Alternatively,



Median[EmpiricalDistribution[wd]]
Quantile[EmpiricalDistribution[wd], 1/2]
InverseCDF[EmpiricalDistribution[wd], 1/2]



5







share|improve this answer




















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "387"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f183930%2fhow-to-get-median-based-on-probability-distribution%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    3
    down vote



    accepted










    To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution.



    First let's get an example table of pairs of value,probability like you showed in your question



    list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]



    1, 0.0538923, 2, 0.00538521, 3, 0.158895, 4, 0.10697, 5,
    0.0799713, 6, 0.17624, 7, 0.112601, 8, 0.128191, 9,
    0.156779, 10, 0.0210756




    Then we make this into a EmpiricalDistribution:



    dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
    Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]


    CDF plot



    Here we use the syntax where we have used the probabilities as weights to single sample examples to get the right distribution. If you have your original data sample before binning that's even better as an input and EmpiricalDistribution will do the binning for you.



    Now we can easily get the median by calling Median on our distribution:



    Median[dist]



    6







    share|improve this answer
















    • 1




      Thanks for following through. ;) One thing: you could have used Normalize[RandomReal[0, 1, 10], Total] as well.
      – J. M. is computer-less♦
      44 mins ago











    • @J.M. Ah, cool idea! Just thought about #/Norm[#,1]&, but haven't thought about using Normalize in that way!
      – Thies Heidecke
      33 mins ago














    up vote
    3
    down vote



    accepted










    To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution.



    First let's get an example table of pairs of value,probability like you showed in your question



    list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]



    1, 0.0538923, 2, 0.00538521, 3, 0.158895, 4, 0.10697, 5,
    0.0799713, 6, 0.17624, 7, 0.112601, 8, 0.128191, 9,
    0.156779, 10, 0.0210756




    Then we make this into a EmpiricalDistribution:



    dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
    Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]


    CDF plot



    Here we use the syntax where we have used the probabilities as weights to single sample examples to get the right distribution. If you have your original data sample before binning that's even better as an input and EmpiricalDistribution will do the binning for you.



    Now we can easily get the median by calling Median on our distribution:



    Median[dist]



    6







    share|improve this answer
















    • 1




      Thanks for following through. ;) One thing: you could have used Normalize[RandomReal[0, 1, 10], Total] as well.
      – J. M. is computer-less♦
      44 mins ago











    • @J.M. Ah, cool idea! Just thought about #/Norm[#,1]&, but haven't thought about using Normalize in that way!
      – Thies Heidecke
      33 mins ago












    up vote
    3
    down vote



    accepted







    up vote
    3
    down vote



    accepted






    To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution.



    First let's get an example table of pairs of value,probability like you showed in your question



    list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]



    1, 0.0538923, 2, 0.00538521, 3, 0.158895, 4, 0.10697, 5,
    0.0799713, 6, 0.17624, 7, 0.112601, 8, 0.128191, 9,
    0.156779, 10, 0.0210756




    Then we make this into a EmpiricalDistribution:



    dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
    Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]


    CDF plot



    Here we use the syntax where we have used the probabilities as weights to single sample examples to get the right distribution. If you have your original data sample before binning that's even better as an input and EmpiricalDistribution will do the binning for you.



    Now we can easily get the median by calling Median on our distribution:



    Median[dist]



    6







    share|improve this answer












    To elaborate a bit on what J.M. hinted at, this is one way of achieving what you want with EmpiricalDistribution.



    First let's get an example table of pairs of value,probability like you showed in your question



    list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]]



    1, 0.0538923, 2, 0.00538521, 3, 0.158895, 4, 0.10697, 5,
    0.0799713, 6, 0.17624, 7, 0.112601, 8, 0.128191, 9,
    0.156779, 10, 0.0210756




    Then we make this into a EmpiricalDistribution:



    dist = EmpiricalDistribution[list[[All, 2]] -> list[[All, 1]]]
    Plot[CDF[dist, x], x, 0, 11, Filling -> Axis, Exclusions -> None]


    CDF plot



    Here we use the syntax where we have used the probabilities as weights to single sample examples to get the right distribution. If you have your original data sample before binning that's even better as an input and EmpiricalDistribution will do the binning for you.



    Now we can easily get the median by calling Median on our distribution:



    Median[dist]



    6








    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered 46 mins ago









    Thies Heidecke

    6,3212438




    6,3212438







    • 1




      Thanks for following through. ;) One thing: you could have used Normalize[RandomReal[0, 1, 10], Total] as well.
      – J. M. is computer-less♦
      44 mins ago











    • @J.M. Ah, cool idea! Just thought about #/Norm[#,1]&, but haven't thought about using Normalize in that way!
      – Thies Heidecke
      33 mins ago












    • 1




      Thanks for following through. ;) One thing: you could have used Normalize[RandomReal[0, 1, 10], Total] as well.
      – J. M. is computer-less♦
      44 mins ago











    • @J.M. Ah, cool idea! Just thought about #/Norm[#,1]&, but haven't thought about using Normalize in that way!
      – Thies Heidecke
      33 mins ago







    1




    1




    Thanks for following through. ;) One thing: you could have used Normalize[RandomReal[0, 1, 10], Total] as well.
    – J. M. is computer-less♦
    44 mins ago





    Thanks for following through. ;) One thing: you could have used Normalize[RandomReal[0, 1, 10], Total] as well.
    – J. M. is computer-less♦
    44 mins ago













    @J.M. Ah, cool idea! Just thought about #/Norm[#,1]&, but haven't thought about using Normalize in that way!
    – Thies Heidecke
    33 mins ago




    @J.M. Ah, cool idea! Just thought about #/Norm[#,1]&, but haven't thought about using Normalize in that way!
    – Thies Heidecke
    33 mins ago










    up vote
    1
    down vote













    You can also use WeightedData as follows:



    SeedRandom[1]
    list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
    wd = WeightedData @@ Transpose[list];


    You can use Median or Quantile with wd



    Median[wd]



    5




    Quantile[wd, 1/2]



    5




    Alternatively,



    Median[EmpiricalDistribution[wd]]
    Quantile[EmpiricalDistribution[wd], 1/2]
    InverseCDF[EmpiricalDistribution[wd], 1/2]



    5







    share|improve this answer
























      up vote
      1
      down vote













      You can also use WeightedData as follows:



      SeedRandom[1]
      list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
      wd = WeightedData @@ Transpose[list];


      You can use Median or Quantile with wd



      Median[wd]



      5




      Quantile[wd, 1/2]



      5




      Alternatively,



      Median[EmpiricalDistribution[wd]]
      Quantile[EmpiricalDistribution[wd], 1/2]
      InverseCDF[EmpiricalDistribution[wd], 1/2]



      5







      share|improve this answer






















        up vote
        1
        down vote










        up vote
        1
        down vote









        You can also use WeightedData as follows:



        SeedRandom[1]
        list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
        wd = WeightedData @@ Transpose[list];


        You can use Median or Quantile with wd



        Median[wd]



        5




        Quantile[wd, 1/2]



        5




        Alternatively,



        Median[EmpiricalDistribution[wd]]
        Quantile[EmpiricalDistribution[wd], 1/2]
        InverseCDF[EmpiricalDistribution[wd], 1/2]



        5







        share|improve this answer












        You can also use WeightedData as follows:



        SeedRandom[1]
        list = Transpose[Range[10], #/Total[#] &[RandomReal[0, 1, 10]]];
        wd = WeightedData @@ Transpose[list];


        You can use Median or Quantile with wd



        Median[wd]



        5




        Quantile[wd, 1/2]



        5




        Alternatively,



        Median[EmpiricalDistribution[wd]]
        Quantile[EmpiricalDistribution[wd], 1/2]
        InverseCDF[EmpiricalDistribution[wd], 1/2]



        5








        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 39 mins ago









        kglr

        165k8188388




        165k8188388



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f183930%2fhow-to-get-median-based-on-probability-distribution%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            What does second last employer means? [closed]

            Installing NextGIS Connect into QGIS 3?

            One-line joke