Why is a simulation of a probability experiment off by a factor of 10?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
9
down vote

favorite
5












From a university homework assignment:



There are $8$ numbered cells and $12$ indistinct balls. All $12$ balls are randomly divided between all of the $8$ cells. What is the probability that there is not a single empty cell ($i.e.$ each cell has at least $1$ ball)?



The answer is $largefracbinom117binom197$ which is about $0.0065$. I reached this result independently, and it was confirmed by the official homework solution of the university.



A friend of mine and I independently wrote Python simulations that run the experiment many times (tested up to $1,000,000$). We used both Pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around $0.09$ which is a factor of $10$ or even a bit more off from the expected theoretical result.



Have we made some wrong assumptions?
Any ideas for this discrepancy?



P.S.: Here is the Python code that I wrote, and maybe there is some faulty logic there.





def run_test():
global count, N

def run_experiment(n_balls, n_cells, offset):
cells = [0] * n_cells
# toss balls randomly to cells:
for j in range(n_balls):
cells[random.randrange(0, n_cells)] += 1
# cells[int(lines[offset + j])] += 1
cells = sorted(cells)
# print(cells)

# check if there is an empty cell. if so return 0, otherwise 1:
if cells[0] == 0:
return 0
return 1

count = 0
N = 1000000
offset = 0
N_CELLS = 8
N_BALLS = 12
# iterate experiment
for i in range(N):
result = run_experiment(N_BALLS, N_CELLS, offset=offset)
count += result
offset += N_CELLS

print("probability:", count, "/", N, "(~", count / N, ")")









share|cite|improve this question



















  • 1




    I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
    – Alexey
    12 hours ago










  • If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
    – Alexey
    12 hours ago







  • 1




    Besides the distinguishability of the balls is how they are distributed between the cells. You can either assign a cell to each ball, uniformly at random, independently of all the other balls, or you can look at all the different distinguishable ball distributions and you pick one of those uniformly at random. That's the difference between your two approaches.
    – Arthur
    11 hours ago







  • 4




    "Have we made some wrong assumptions?" Hard to tell, since you have shown neither the code that you have written nor specified the assumptions that you have made. Without mind-reading, this question is impossible to answer. If your question involves your code, you should show the relevant code in the question itself.
    – John Coleman
    11 hours ago







  • 4




    It seems the book solution assumes that all the ways to distribute the balls into the cells are equally likely--for example a distribution of (12,0,0,0,0,0,0) is just as likely as (2,2,2,2,2,2,0,0). This is not consistent with a model that says each ball is dropped into a cell chosen independently and randomly, with each cell equally likely to be chosen, which I think is more realistic.
    – awkward
    8 hours ago














up vote
9
down vote

favorite
5












From a university homework assignment:



There are $8$ numbered cells and $12$ indistinct balls. All $12$ balls are randomly divided between all of the $8$ cells. What is the probability that there is not a single empty cell ($i.e.$ each cell has at least $1$ ball)?



The answer is $largefracbinom117binom197$ which is about $0.0065$. I reached this result independently, and it was confirmed by the official homework solution of the university.



A friend of mine and I independently wrote Python simulations that run the experiment many times (tested up to $1,000,000$). We used both Pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around $0.09$ which is a factor of $10$ or even a bit more off from the expected theoretical result.



Have we made some wrong assumptions?
Any ideas for this discrepancy?



P.S.: Here is the Python code that I wrote, and maybe there is some faulty logic there.





def run_test():
global count, N

def run_experiment(n_balls, n_cells, offset):
cells = [0] * n_cells
# toss balls randomly to cells:
for j in range(n_balls):
cells[random.randrange(0, n_cells)] += 1
# cells[int(lines[offset + j])] += 1
cells = sorted(cells)
# print(cells)

# check if there is an empty cell. if so return 0, otherwise 1:
if cells[0] == 0:
return 0
return 1

count = 0
N = 1000000
offset = 0
N_CELLS = 8
N_BALLS = 12
# iterate experiment
for i in range(N):
result = run_experiment(N_BALLS, N_CELLS, offset=offset)
count += result
offset += N_CELLS

print("probability:", count, "/", N, "(~", count / N, ")")









share|cite|improve this question



















  • 1




    I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
    – Alexey
    12 hours ago










  • If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
    – Alexey
    12 hours ago







  • 1




    Besides the distinguishability of the balls is how they are distributed between the cells. You can either assign a cell to each ball, uniformly at random, independently of all the other balls, or you can look at all the different distinguishable ball distributions and you pick one of those uniformly at random. That's the difference between your two approaches.
    – Arthur
    11 hours ago







  • 4




    "Have we made some wrong assumptions?" Hard to tell, since you have shown neither the code that you have written nor specified the assumptions that you have made. Without mind-reading, this question is impossible to answer. If your question involves your code, you should show the relevant code in the question itself.
    – John Coleman
    11 hours ago







  • 4




    It seems the book solution assumes that all the ways to distribute the balls into the cells are equally likely--for example a distribution of (12,0,0,0,0,0,0) is just as likely as (2,2,2,2,2,2,0,0). This is not consistent with a model that says each ball is dropped into a cell chosen independently and randomly, with each cell equally likely to be chosen, which I think is more realistic.
    – awkward
    8 hours ago












up vote
9
down vote

favorite
5









up vote
9
down vote

favorite
5






5





From a university homework assignment:



There are $8$ numbered cells and $12$ indistinct balls. All $12$ balls are randomly divided between all of the $8$ cells. What is the probability that there is not a single empty cell ($i.e.$ each cell has at least $1$ ball)?



The answer is $largefracbinom117binom197$ which is about $0.0065$. I reached this result independently, and it was confirmed by the official homework solution of the university.



A friend of mine and I independently wrote Python simulations that run the experiment many times (tested up to $1,000,000$). We used both Pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around $0.09$ which is a factor of $10$ or even a bit more off from the expected theoretical result.



Have we made some wrong assumptions?
Any ideas for this discrepancy?



P.S.: Here is the Python code that I wrote, and maybe there is some faulty logic there.





def run_test():
global count, N

def run_experiment(n_balls, n_cells, offset):
cells = [0] * n_cells
# toss balls randomly to cells:
for j in range(n_balls):
cells[random.randrange(0, n_cells)] += 1
# cells[int(lines[offset + j])] += 1
cells = sorted(cells)
# print(cells)

# check if there is an empty cell. if so return 0, otherwise 1:
if cells[0] == 0:
return 0
return 1

count = 0
N = 1000000
offset = 0
N_CELLS = 8
N_BALLS = 12
# iterate experiment
for i in range(N):
result = run_experiment(N_BALLS, N_CELLS, offset=offset)
count += result
offset += N_CELLS

print("probability:", count, "/", N, "(~", count / N, ")")









share|cite|improve this question















From a university homework assignment:



There are $8$ numbered cells and $12$ indistinct balls. All $12$ balls are randomly divided between all of the $8$ cells. What is the probability that there is not a single empty cell ($i.e.$ each cell has at least $1$ ball)?



The answer is $largefracbinom117binom197$ which is about $0.0065$. I reached this result independently, and it was confirmed by the official homework solution of the university.



A friend of mine and I independently wrote Python simulations that run the experiment many times (tested up to $1,000,000$). We used both Pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around $0.09$ which is a factor of $10$ or even a bit more off from the expected theoretical result.



Have we made some wrong assumptions?
Any ideas for this discrepancy?



P.S.: Here is the Python code that I wrote, and maybe there is some faulty logic there.





def run_test():
global count, N

def run_experiment(n_balls, n_cells, offset):
cells = [0] * n_cells
# toss balls randomly to cells:
for j in range(n_balls):
cells[random.randrange(0, n_cells)] += 1
# cells[int(lines[offset + j])] += 1
cells = sorted(cells)
# print(cells)

# check if there is an empty cell. if so return 0, otherwise 1:
if cells[0] == 0:
return 0
return 1

count = 0
N = 1000000
offset = 0
N_CELLS = 8
N_BALLS = 12
# iterate experiment
for i in range(N):
result = run_experiment(N_BALLS, N_CELLS, offset=offset)
count += result
offset += N_CELLS

print("probability:", count, "/", N, "(~", count / N, ")")






probability simulation python






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited 13 mins ago









Chris Culter

19.4k43280




19.4k43280










asked 13 hours ago









Shmuel Levinson

494




494







  • 1




    I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
    – Alexey
    12 hours ago










  • If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
    – Alexey
    12 hours ago







  • 1




    Besides the distinguishability of the balls is how they are distributed between the cells. You can either assign a cell to each ball, uniformly at random, independently of all the other balls, or you can look at all the different distinguishable ball distributions and you pick one of those uniformly at random. That's the difference between your two approaches.
    – Arthur
    11 hours ago







  • 4




    "Have we made some wrong assumptions?" Hard to tell, since you have shown neither the code that you have written nor specified the assumptions that you have made. Without mind-reading, this question is impossible to answer. If your question involves your code, you should show the relevant code in the question itself.
    – John Coleman
    11 hours ago







  • 4




    It seems the book solution assumes that all the ways to distribute the balls into the cells are equally likely--for example a distribution of (12,0,0,0,0,0,0) is just as likely as (2,2,2,2,2,2,0,0). This is not consistent with a model that says each ball is dropped into a cell chosen independently and randomly, with each cell equally likely to be chosen, which I think is more realistic.
    – awkward
    8 hours ago












  • 1




    I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
    – Alexey
    12 hours ago










  • If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
    – Alexey
    12 hours ago







  • 1




    Besides the distinguishability of the balls is how they are distributed between the cells. You can either assign a cell to each ball, uniformly at random, independently of all the other balls, or you can look at all the different distinguishable ball distributions and you pick one of those uniformly at random. That's the difference between your two approaches.
    – Arthur
    11 hours ago







  • 4




    "Have we made some wrong assumptions?" Hard to tell, since you have shown neither the code that you have written nor specified the assumptions that you have made. Without mind-reading, this question is impossible to answer. If your question involves your code, you should show the relevant code in the question itself.
    – John Coleman
    11 hours ago







  • 4




    It seems the book solution assumes that all the ways to distribute the balls into the cells are equally likely--for example a distribution of (12,0,0,0,0,0,0) is just as likely as (2,2,2,2,2,2,0,0). This is not consistent with a model that says each ball is dropped into a cell chosen independently and randomly, with each cell equally likely to be chosen, which I think is more realistic.
    – awkward
    8 hours ago







1




1




I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
– Alexey
12 hours ago




I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
– Alexey
12 hours ago












If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
– Alexey
12 hours ago





If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
– Alexey
12 hours ago





1




1




Besides the distinguishability of the balls is how they are distributed between the cells. You can either assign a cell to each ball, uniformly at random, independently of all the other balls, or you can look at all the different distinguishable ball distributions and you pick one of those uniformly at random. That's the difference between your two approaches.
– Arthur
11 hours ago





Besides the distinguishability of the balls is how they are distributed between the cells. You can either assign a cell to each ball, uniformly at random, independently of all the other balls, or you can look at all the different distinguishable ball distributions and you pick one of those uniformly at random. That's the difference between your two approaches.
– Arthur
11 hours ago





4




4




"Have we made some wrong assumptions?" Hard to tell, since you have shown neither the code that you have written nor specified the assumptions that you have made. Without mind-reading, this question is impossible to answer. If your question involves your code, you should show the relevant code in the question itself.
– John Coleman
11 hours ago





"Have we made some wrong assumptions?" Hard to tell, since you have shown neither the code that you have written nor specified the assumptions that you have made. Without mind-reading, this question is impossible to answer. If your question involves your code, you should show the relevant code in the question itself.
– John Coleman
11 hours ago





4




4




It seems the book solution assumes that all the ways to distribute the balls into the cells are equally likely--for example a distribution of (12,0,0,0,0,0,0) is just as likely as (2,2,2,2,2,2,0,0). This is not consistent with a model that says each ball is dropped into a cell chosen independently and randomly, with each cell equally likely to be chosen, which I think is more realistic.
– awkward
8 hours ago




It seems the book solution assumes that all the ways to distribute the balls into the cells are equally likely--for example a distribution of (12,0,0,0,0,0,0) is just as likely as (2,2,2,2,2,2,0,0). This is not consistent with a model that says each ball is dropped into a cell chosen independently and randomly, with each cell equally likely to be chosen, which I think is more realistic.
– awkward
8 hours ago










5 Answers
5






active

oldest

votes

















up vote
14
down vote













In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $frac18^12$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $frac119 choose 7$ over a million times more likely but difficult to simulate



If the balls are distinguishable, the probability all eight boxes are full is $$frac8! , S_2(12,8)8^12$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?



If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside Bose–Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $0,1,2,3,ldots,18$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive






share|cite|improve this answer


















  • 1




    This is indeed similar to my simulation. Thanks for the detailed answer!
    – Shmuel Levinson
    12 hours ago






  • 3




    Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult
    – Carl Witthoft
    6 hours ago






  • 1




    I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell?
    – JimmyJames
    1 hour ago






  • 1




    @JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities?
    – Henry
    1 hour ago







  • 1




    It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails.
    – JimmyJames
    56 mins ago

















up vote
9
down vote













Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.



Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.



The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.



Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:



weight = 1
for cell_population in cells:
weight *= math.factorial(cell_population)


At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.






share|cite|improve this answer



























    up vote
    2
    down vote













    The original problem is posed, so far as I can tell, to show the difference between combinations and permutations. In nature, there is no such thing as indistinguishable balls. Semi-infinite tests (e.g. Las Vegas) have shown this to be true.



    Now, if the problem really wants you to use "indistinguishable" balls for the purposes of solving the problem, then yes, you need to use combinations and not permutations when calculating all the ways the indistinguishable balls are placed into the containers.
    And of course, you need to use permutations for the numbered balls, as they are distinguishable from each other and from the collection of indistinguishable balls.



    Now, I believe that Chris Culter's calculations reflect this difference. Whether your Python code does this correctly we can't say until we see the code.






    share|cite|improve this answer
















    • 1




      pastebin.com/5mKBxfbM
      – Shmuel Levinson
      3 hours ago

















    up vote
    0
    down vote













    Thanks to the answers here I figured out my mistake in the simulation. It simply didn't create a uniform distribution. Essentially in my simulation the balls were numbered






    share|cite|improve this answer




















    • I'm just curious here. Can you explain why that mattered? The task is simply to count the number of balls randomly assigned to a cell. Where did the number on the ball come in?
      – JimmyJames
      42 mins ago










    • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
      – Scientifica
      30 mins ago










    • Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
      – max_zorn
      17 mins ago

















    up vote
    -1
    down vote













    My answer is intended to add to the existing ones, which are already good.




    A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to 1,000,000) ... [and they are] ... off from the expected theoretical result. ... Have we made some wrong assumptions? Any ideas for this discrepancy?




    Remember that the theoretical result and the experimental result are generally not going to be the same. The law of large numbers basically says that, as you run more and more experiments, the experimental results will tend towards the theoretical results.



    From your description, it sounds like, in all cases, you only ran "up to" (i.e. a maximum of) 1M simulations. Did you try larger numbers, such as 2M or even up to 10M or 25M?



    If we can assume for a moment that there're no issues with the simulation code, then increasing the number of experiments should get you closer to your expected theoretical results.




    We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around 0.09 which is a factor of 10




    The results may be similar because, from your description, you're only running up to 1M experiments, but don't appear to have tried going beyond that limit.



    My response is also assuming that the use of Python's PRNGs has been done according to the documentation, as not doing so can also have an impact in the validity of your simulation.



    That said, note that to verify/review the validity of your code, SO or CodeReview might be a more appropriate site than this one.






    share|cite|improve this answer
















    • 4




      If the true probability is 0.0065, then over a million trials the expected number of successes is 6500. If you have a Poisson process with $lambda= 6500$, then the standard deviation is the square root of that, or slightly more than 80. The z-score for 90000 is over 1000. The idea that more trials will make this effect disappear is absurd.
      – Acccumulation
      3 hours ago











    • Indeed the difference between 100, 1,000 or 1,000,000 iterations was non-existent
      – Shmuel Levinson
      2 hours ago











    • Fair enough. I'll remove it when I get a few more minutes then.
      – code_dredd
      41 mins ago










    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "69"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2975845%2fwhy-is-a-simulation-of-a-probability-experiment-off-by-a-factor-of-10%23new-answer', 'question_page');

    );

    Post as a guest






























    5 Answers
    5






    active

    oldest

    votes








    5 Answers
    5






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    14
    down vote













    In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $frac18^12$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $frac119 choose 7$ over a million times more likely but difficult to simulate



    If the balls are distinguishable, the probability all eight boxes are full is $$frac8! , S_2(12,8)8^12$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?



    If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside Bose–Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $0,1,2,3,ldots,18$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive






    share|cite|improve this answer


















    • 1




      This is indeed similar to my simulation. Thanks for the detailed answer!
      – Shmuel Levinson
      12 hours ago






    • 3




      Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult
      – Carl Witthoft
      6 hours ago






    • 1




      I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell?
      – JimmyJames
      1 hour ago






    • 1




      @JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities?
      – Henry
      1 hour ago







    • 1




      It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails.
      – JimmyJames
      56 mins ago














    up vote
    14
    down vote













    In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $frac18^12$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $frac119 choose 7$ over a million times more likely but difficult to simulate



    If the balls are distinguishable, the probability all eight boxes are full is $$frac8! , S_2(12,8)8^12$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?



    If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside Bose–Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $0,1,2,3,ldots,18$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive






    share|cite|improve this answer


















    • 1




      This is indeed similar to my simulation. Thanks for the detailed answer!
      – Shmuel Levinson
      12 hours ago






    • 3




      Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult
      – Carl Witthoft
      6 hours ago






    • 1




      I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell?
      – JimmyJames
      1 hour ago






    • 1




      @JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities?
      – Henry
      1 hour ago







    • 1




      It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails.
      – JimmyJames
      56 mins ago












    up vote
    14
    down vote










    up vote
    14
    down vote









    In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $frac18^12$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $frac119 choose 7$ over a million times more likely but difficult to simulate



    If the balls are distinguishable, the probability all eight boxes are full is $$frac8! , S_2(12,8)8^12$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?



    If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside Bose–Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $0,1,2,3,ldots,18$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive






    share|cite|improve this answer














    In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $frac18^12$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $frac119 choose 7$ over a million times more likely but difficult to simulate



    If the balls are distinguishable, the probability all eight boxes are full is $$frac8! , S_2(12,8)8^12$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?



    If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside Bose–Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $0,1,2,3,ldots,18$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited 12 hours ago

























    answered 12 hours ago









    Henry

    95.7k473153




    95.7k473153







    • 1




      This is indeed similar to my simulation. Thanks for the detailed answer!
      – Shmuel Levinson
      12 hours ago






    • 3




      Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult
      – Carl Witthoft
      6 hours ago






    • 1




      I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell?
      – JimmyJames
      1 hour ago






    • 1




      @JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities?
      – Henry
      1 hour ago







    • 1




      It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails.
      – JimmyJames
      56 mins ago












    • 1




      This is indeed similar to my simulation. Thanks for the detailed answer!
      – Shmuel Levinson
      12 hours ago






    • 3




      Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult
      – Carl Witthoft
      6 hours ago






    • 1




      I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell?
      – JimmyJames
      1 hour ago






    • 1




      @JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities?
      – Henry
      1 hour ago







    • 1




      It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails.
      – JimmyJames
      56 mins ago







    1




    1




    This is indeed similar to my simulation. Thanks for the detailed answer!
    – Shmuel Levinson
    12 hours ago




    This is indeed similar to my simulation. Thanks for the detailed answer!
    – Shmuel Levinson
    12 hours ago




    3




    3




    Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult
    – Carl Witthoft
    6 hours ago




    Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult
    – Carl Witthoft
    6 hours ago




    1




    1




    I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell?
    – JimmyJames
    1 hour ago




    I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell?
    – JimmyJames
    1 hour ago




    1




    1




    @JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities?
    – Henry
    1 hour ago





    @JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities?
    – Henry
    1 hour ago





    1




    1




    It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails.
    – JimmyJames
    56 mins ago




    It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails.
    – JimmyJames
    56 mins ago










    up vote
    9
    down vote













    Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.



    Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.



    The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.



    Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:



    weight = 1
    for cell_population in cells:
    weight *= math.factorial(cell_population)


    At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.






    share|cite|improve this answer
























      up vote
      9
      down vote













      Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.



      Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.



      The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.



      Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:



      weight = 1
      for cell_population in cells:
      weight *= math.factorial(cell_population)


      At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.






      share|cite|improve this answer






















        up vote
        9
        down vote










        up vote
        9
        down vote









        Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.



        Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.



        The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.



        Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:



        weight = 1
        for cell_population in cells:
        weight *= math.factorial(cell_population)


        At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.






        share|cite|improve this answer












        Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.



        Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.



        The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.



        Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:



        weight = 1
        for cell_population in cells:
        weight *= math.factorial(cell_population)


        At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered 12 hours ago









        Chris Culter

        19.4k43280




        19.4k43280




















            up vote
            2
            down vote













            The original problem is posed, so far as I can tell, to show the difference between combinations and permutations. In nature, there is no such thing as indistinguishable balls. Semi-infinite tests (e.g. Las Vegas) have shown this to be true.



            Now, if the problem really wants you to use "indistinguishable" balls for the purposes of solving the problem, then yes, you need to use combinations and not permutations when calculating all the ways the indistinguishable balls are placed into the containers.
            And of course, you need to use permutations for the numbered balls, as they are distinguishable from each other and from the collection of indistinguishable balls.



            Now, I believe that Chris Culter's calculations reflect this difference. Whether your Python code does this correctly we can't say until we see the code.






            share|cite|improve this answer
















            • 1




              pastebin.com/5mKBxfbM
              – Shmuel Levinson
              3 hours ago














            up vote
            2
            down vote













            The original problem is posed, so far as I can tell, to show the difference between combinations and permutations. In nature, there is no such thing as indistinguishable balls. Semi-infinite tests (e.g. Las Vegas) have shown this to be true.



            Now, if the problem really wants you to use "indistinguishable" balls for the purposes of solving the problem, then yes, you need to use combinations and not permutations when calculating all the ways the indistinguishable balls are placed into the containers.
            And of course, you need to use permutations for the numbered balls, as they are distinguishable from each other and from the collection of indistinguishable balls.



            Now, I believe that Chris Culter's calculations reflect this difference. Whether your Python code does this correctly we can't say until we see the code.






            share|cite|improve this answer
















            • 1




              pastebin.com/5mKBxfbM
              – Shmuel Levinson
              3 hours ago












            up vote
            2
            down vote










            up vote
            2
            down vote









            The original problem is posed, so far as I can tell, to show the difference between combinations and permutations. In nature, there is no such thing as indistinguishable balls. Semi-infinite tests (e.g. Las Vegas) have shown this to be true.



            Now, if the problem really wants you to use "indistinguishable" balls for the purposes of solving the problem, then yes, you need to use combinations and not permutations when calculating all the ways the indistinguishable balls are placed into the containers.
            And of course, you need to use permutations for the numbered balls, as they are distinguishable from each other and from the collection of indistinguishable balls.



            Now, I believe that Chris Culter's calculations reflect this difference. Whether your Python code does this correctly we can't say until we see the code.






            share|cite|improve this answer












            The original problem is posed, so far as I can tell, to show the difference between combinations and permutations. In nature, there is no such thing as indistinguishable balls. Semi-infinite tests (e.g. Las Vegas) have shown this to be true.



            Now, if the problem really wants you to use "indistinguishable" balls for the purposes of solving the problem, then yes, you need to use combinations and not permutations when calculating all the ways the indistinguishable balls are placed into the containers.
            And of course, you need to use permutations for the numbered balls, as they are distinguishable from each other and from the collection of indistinguishable balls.



            Now, I believe that Chris Culter's calculations reflect this difference. Whether your Python code does this correctly we can't say until we see the code.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered 6 hours ago









            Carl Witthoft

            30218




            30218







            • 1




              pastebin.com/5mKBxfbM
              – Shmuel Levinson
              3 hours ago












            • 1




              pastebin.com/5mKBxfbM
              – Shmuel Levinson
              3 hours ago







            1




            1




            pastebin.com/5mKBxfbM
            – Shmuel Levinson
            3 hours ago




            pastebin.com/5mKBxfbM
            – Shmuel Levinson
            3 hours ago










            up vote
            0
            down vote













            Thanks to the answers here I figured out my mistake in the simulation. It simply didn't create a uniform distribution. Essentially in my simulation the balls were numbered






            share|cite|improve this answer




















            • I'm just curious here. Can you explain why that mattered? The task is simply to count the number of balls randomly assigned to a cell. Where did the number on the ball come in?
              – JimmyJames
              42 mins ago










            • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
              – Scientifica
              30 mins ago










            • Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
              – max_zorn
              17 mins ago














            up vote
            0
            down vote













            Thanks to the answers here I figured out my mistake in the simulation. It simply didn't create a uniform distribution. Essentially in my simulation the balls were numbered






            share|cite|improve this answer




















            • I'm just curious here. Can you explain why that mattered? The task is simply to count the number of balls randomly assigned to a cell. Where did the number on the ball come in?
              – JimmyJames
              42 mins ago










            • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
              – Scientifica
              30 mins ago










            • Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
              – max_zorn
              17 mins ago












            up vote
            0
            down vote










            up vote
            0
            down vote









            Thanks to the answers here I figured out my mistake in the simulation. It simply didn't create a uniform distribution. Essentially in my simulation the balls were numbered






            share|cite|improve this answer












            Thanks to the answers here I figured out my mistake in the simulation. It simply didn't create a uniform distribution. Essentially in my simulation the balls were numbered







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered 2 hours ago









            Shmuel Levinson

            494




            494











            • I'm just curious here. Can you explain why that mattered? The task is simply to count the number of balls randomly assigned to a cell. Where did the number on the ball come in?
              – JimmyJames
              42 mins ago










            • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
              – Scientifica
              30 mins ago










            • Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
              – max_zorn
              17 mins ago
















            • I'm just curious here. Can you explain why that mattered? The task is simply to count the number of balls randomly assigned to a cell. Where did the number on the ball come in?
              – JimmyJames
              42 mins ago










            • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
              – Scientifica
              30 mins ago










            • Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
              – max_zorn
              17 mins ago















            I'm just curious here. Can you explain why that mattered? The task is simply to count the number of balls randomly assigned to a cell. Where did the number on the ball come in?
            – JimmyJames
            42 mins ago




            I'm just curious here. Can you explain why that mattered? The task is simply to count the number of balls randomly assigned to a cell. Where did the number on the ball come in?
            – JimmyJames
            42 mins ago












            This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
            – Scientifica
            30 mins ago




            This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
            – Scientifica
            30 mins ago












            Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
            – max_zorn
            17 mins ago




            Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
            – max_zorn
            17 mins ago










            up vote
            -1
            down vote













            My answer is intended to add to the existing ones, which are already good.




            A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to 1,000,000) ... [and they are] ... off from the expected theoretical result. ... Have we made some wrong assumptions? Any ideas for this discrepancy?




            Remember that the theoretical result and the experimental result are generally not going to be the same. The law of large numbers basically says that, as you run more and more experiments, the experimental results will tend towards the theoretical results.



            From your description, it sounds like, in all cases, you only ran "up to" (i.e. a maximum of) 1M simulations. Did you try larger numbers, such as 2M or even up to 10M or 25M?



            If we can assume for a moment that there're no issues with the simulation code, then increasing the number of experiments should get you closer to your expected theoretical results.




            We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around 0.09 which is a factor of 10




            The results may be similar because, from your description, you're only running up to 1M experiments, but don't appear to have tried going beyond that limit.



            My response is also assuming that the use of Python's PRNGs has been done according to the documentation, as not doing so can also have an impact in the validity of your simulation.



            That said, note that to verify/review the validity of your code, SO or CodeReview might be a more appropriate site than this one.






            share|cite|improve this answer
















            • 4




              If the true probability is 0.0065, then over a million trials the expected number of successes is 6500. If you have a Poisson process with $lambda= 6500$, then the standard deviation is the square root of that, or slightly more than 80. The z-score for 90000 is over 1000. The idea that more trials will make this effect disappear is absurd.
              – Acccumulation
              3 hours ago











            • Indeed the difference between 100, 1,000 or 1,000,000 iterations was non-existent
              – Shmuel Levinson
              2 hours ago











            • Fair enough. I'll remove it when I get a few more minutes then.
              – code_dredd
              41 mins ago














            up vote
            -1
            down vote













            My answer is intended to add to the existing ones, which are already good.




            A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to 1,000,000) ... [and they are] ... off from the expected theoretical result. ... Have we made some wrong assumptions? Any ideas for this discrepancy?




            Remember that the theoretical result and the experimental result are generally not going to be the same. The law of large numbers basically says that, as you run more and more experiments, the experimental results will tend towards the theoretical results.



            From your description, it sounds like, in all cases, you only ran "up to" (i.e. a maximum of) 1M simulations. Did you try larger numbers, such as 2M or even up to 10M or 25M?



            If we can assume for a moment that there're no issues with the simulation code, then increasing the number of experiments should get you closer to your expected theoretical results.




            We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around 0.09 which is a factor of 10




            The results may be similar because, from your description, you're only running up to 1M experiments, but don't appear to have tried going beyond that limit.



            My response is also assuming that the use of Python's PRNGs has been done according to the documentation, as not doing so can also have an impact in the validity of your simulation.



            That said, note that to verify/review the validity of your code, SO or CodeReview might be a more appropriate site than this one.






            share|cite|improve this answer
















            • 4




              If the true probability is 0.0065, then over a million trials the expected number of successes is 6500. If you have a Poisson process with $lambda= 6500$, then the standard deviation is the square root of that, or slightly more than 80. The z-score for 90000 is over 1000. The idea that more trials will make this effect disappear is absurd.
              – Acccumulation
              3 hours ago











            • Indeed the difference between 100, 1,000 or 1,000,000 iterations was non-existent
              – Shmuel Levinson
              2 hours ago











            • Fair enough. I'll remove it when I get a few more minutes then.
              – code_dredd
              41 mins ago












            up vote
            -1
            down vote










            up vote
            -1
            down vote









            My answer is intended to add to the existing ones, which are already good.




            A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to 1,000,000) ... [and they are] ... off from the expected theoretical result. ... Have we made some wrong assumptions? Any ideas for this discrepancy?




            Remember that the theoretical result and the experimental result are generally not going to be the same. The law of large numbers basically says that, as you run more and more experiments, the experimental results will tend towards the theoretical results.



            From your description, it sounds like, in all cases, you only ran "up to" (i.e. a maximum of) 1M simulations. Did you try larger numbers, such as 2M or even up to 10M or 25M?



            If we can assume for a moment that there're no issues with the simulation code, then increasing the number of experiments should get you closer to your expected theoretical results.




            We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around 0.09 which is a factor of 10




            The results may be similar because, from your description, you're only running up to 1M experiments, but don't appear to have tried going beyond that limit.



            My response is also assuming that the use of Python's PRNGs has been done according to the documentation, as not doing so can also have an impact in the validity of your simulation.



            That said, note that to verify/review the validity of your code, SO or CodeReview might be a more appropriate site than this one.






            share|cite|improve this answer












            My answer is intended to add to the existing ones, which are already good.




            A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to 1,000,000) ... [and they are] ... off from the expected theoretical result. ... Have we made some wrong assumptions? Any ideas for this discrepancy?




            Remember that the theoretical result and the experimental result are generally not going to be the same. The law of large numbers basically says that, as you run more and more experiments, the experimental results will tend towards the theoretical results.



            From your description, it sounds like, in all cases, you only ran "up to" (i.e. a maximum of) 1M simulations. Did you try larger numbers, such as 2M or even up to 10M or 25M?



            If we can assume for a moment that there're no issues with the simulation code, then increasing the number of experiments should get you closer to your expected theoretical results.




            We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around 0.09 which is a factor of 10




            The results may be similar because, from your description, you're only running up to 1M experiments, but don't appear to have tried going beyond that limit.



            My response is also assuming that the use of Python's PRNGs has been done according to the documentation, as not doing so can also have an impact in the validity of your simulation.



            That said, note that to verify/review the validity of your code, SO or CodeReview might be a more appropriate site than this one.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered 4 hours ago









            code_dredd

            1013




            1013







            • 4




              If the true probability is 0.0065, then over a million trials the expected number of successes is 6500. If you have a Poisson process with $lambda= 6500$, then the standard deviation is the square root of that, or slightly more than 80. The z-score for 90000 is over 1000. The idea that more trials will make this effect disappear is absurd.
              – Acccumulation
              3 hours ago











            • Indeed the difference between 100, 1,000 or 1,000,000 iterations was non-existent
              – Shmuel Levinson
              2 hours ago











            • Fair enough. I'll remove it when I get a few more minutes then.
              – code_dredd
              41 mins ago












            • 4




              If the true probability is 0.0065, then over a million trials the expected number of successes is 6500. If you have a Poisson process with $lambda= 6500$, then the standard deviation is the square root of that, or slightly more than 80. The z-score for 90000 is over 1000. The idea that more trials will make this effect disappear is absurd.
              – Acccumulation
              3 hours ago











            • Indeed the difference between 100, 1,000 or 1,000,000 iterations was non-existent
              – Shmuel Levinson
              2 hours ago











            • Fair enough. I'll remove it when I get a few more minutes then.
              – code_dredd
              41 mins ago







            4




            4




            If the true probability is 0.0065, then over a million trials the expected number of successes is 6500. If you have a Poisson process with $lambda= 6500$, then the standard deviation is the square root of that, or slightly more than 80. The z-score for 90000 is over 1000. The idea that more trials will make this effect disappear is absurd.
            – Acccumulation
            3 hours ago





            If the true probability is 0.0065, then over a million trials the expected number of successes is 6500. If you have a Poisson process with $lambda= 6500$, then the standard deviation is the square root of that, or slightly more than 80. The z-score for 90000 is over 1000. The idea that more trials will make this effect disappear is absurd.
            – Acccumulation
            3 hours ago













            Indeed the difference between 100, 1,000 or 1,000,000 iterations was non-existent
            – Shmuel Levinson
            2 hours ago





            Indeed the difference between 100, 1,000 or 1,000,000 iterations was non-existent
            – Shmuel Levinson
            2 hours ago













            Fair enough. I'll remove it when I get a few more minutes then.
            – code_dredd
            41 mins ago




            Fair enough. I'll remove it when I get a few more minutes then.
            – code_dredd
            41 mins ago

















             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2975845%2fwhy-is-a-simulation-of-a-probability-experiment-off-by-a-factor-of-10%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            Long meetings (6-7 hours a day): Being “babysat” by supervisor

            Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

            Confectionery