Why a simulation of a probability experiment is off by a factor of 10?
Clash Royale CLAN TAG#URR8PPP
up vote
4
down vote
favorite
From a university homework assignment:
There are $8$ numbered cells and $12$ indistinct balls. All $12$ balls are randomly divided between all of the $8$ cells. What is the probability that there is not a single empty cell ($i.e.$ each cell has at least $1$ ball)?
The answer is $largefracbinom117binom197$ which is about $0.0065$. I reached this result independently and it was confirmed by the official homework solution of the university.
A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to $1,000,000$). We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around $0.09$ which is a factor of $10$ or even a bit more off from the expected theoretical result.
Have we made some wrong assumptions?
Any ideas for this discrepancy?
Thanks.
P.S
I will paste the python code that I wrote, maybe there is some faulty logic there.
probability simulation python
add a comment |Â
up vote
4
down vote
favorite
From a university homework assignment:
There are $8$ numbered cells and $12$ indistinct balls. All $12$ balls are randomly divided between all of the $8$ cells. What is the probability that there is not a single empty cell ($i.e.$ each cell has at least $1$ ball)?
The answer is $largefracbinom117binom197$ which is about $0.0065$. I reached this result independently and it was confirmed by the official homework solution of the university.
A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to $1,000,000$). We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around $0.09$ which is a factor of $10$ or even a bit more off from the expected theoretical result.
Have we made some wrong assumptions?
Any ideas for this discrepancy?
Thanks.
P.S
I will paste the python code that I wrote, maybe there is some faulty logic there.
probability simulation python
I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
– Alexey
14 mins ago
If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
– Alexey
12 mins ago
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
From a university homework assignment:
There are $8$ numbered cells and $12$ indistinct balls. All $12$ balls are randomly divided between all of the $8$ cells. What is the probability that there is not a single empty cell ($i.e.$ each cell has at least $1$ ball)?
The answer is $largefracbinom117binom197$ which is about $0.0065$. I reached this result independently and it was confirmed by the official homework solution of the university.
A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to $1,000,000$). We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around $0.09$ which is a factor of $10$ or even a bit more off from the expected theoretical result.
Have we made some wrong assumptions?
Any ideas for this discrepancy?
Thanks.
P.S
I will paste the python code that I wrote, maybe there is some faulty logic there.
probability simulation python
From a university homework assignment:
There are $8$ numbered cells and $12$ indistinct balls. All $12$ balls are randomly divided between all of the $8$ cells. What is the probability that there is not a single empty cell ($i.e.$ each cell has at least $1$ ball)?
The answer is $largefracbinom117binom197$ which is about $0.0065$. I reached this result independently and it was confirmed by the official homework solution of the university.
A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to $1,000,000$). We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around $0.09$ which is a factor of $10$ or even a bit more off from the expected theoretical result.
Have we made some wrong assumptions?
Any ideas for this discrepancy?
Thanks.
P.S
I will paste the python code that I wrote, maybe there is some faulty logic there.
probability simulation python
probability simulation python
edited 23 mins ago
asked 1 hour ago


Shmuel Levinson
213
213
I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
– Alexey
14 mins ago
If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
– Alexey
12 mins ago
add a comment |Â
I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
– Alexey
14 mins ago
If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
– Alexey
12 mins ago
I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
– Alexey
14 mins ago
I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
– Alexey
14 mins ago
If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
– Alexey
12 mins ago
If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
– Alexey
12 mins ago
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
4
down vote
In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $frac18^12$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $frac119 choose 7$ over a million times more likely but difficult to simulate
If the balls are distinguishable, the probability all eight boxes are full is $$frac8! , S_2(12,8)8^12$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?
If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside Bose–Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $0,1,2,3,ldots,18$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive
This is indeed similar to my simulation. Thanks for the detailed answer!
– Shmuel Levinson
18 mins ago
add a comment |Â
up vote
1
down vote
Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.
Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.
The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.
Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:
weight = 1
for cell_population in cells:
weight *= math.factorial(cell_population)
At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $frac18^12$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $frac119 choose 7$ over a million times more likely but difficult to simulate
If the balls are distinguishable, the probability all eight boxes are full is $$frac8! , S_2(12,8)8^12$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?
If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside Bose–Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $0,1,2,3,ldots,18$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive
This is indeed similar to my simulation. Thanks for the detailed answer!
– Shmuel Levinson
18 mins ago
add a comment |Â
up vote
4
down vote
In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $frac18^12$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $frac119 choose 7$ over a million times more likely but difficult to simulate
If the balls are distinguishable, the probability all eight boxes are full is $$frac8! , S_2(12,8)8^12$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?
If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside Bose–Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $0,1,2,3,ldots,18$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive
This is indeed similar to my simulation. Thanks for the detailed answer!
– Shmuel Levinson
18 mins ago
add a comment |Â
up vote
4
down vote
up vote
4
down vote
In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $frac18^12$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $frac119 choose 7$ over a million times more likely but difficult to simulate
If the balls are distinguishable, the probability all eight boxes are full is $$frac8! , S_2(12,8)8^12$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?
If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside Bose–Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $0,1,2,3,ldots,18$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive
In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $frac18^12$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $frac119 choose 7$ over a million times more likely but difficult to simulate
If the balls are distinguishable, the probability all eight boxes are full is $$frac8! , S_2(12,8)8^12$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?
If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside Bose–Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $0,1,2,3,ldots,18$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive
edited 35 mins ago
answered 49 mins ago
Henry
95.6k473152
95.6k473152
This is indeed similar to my simulation. Thanks for the detailed answer!
– Shmuel Levinson
18 mins ago
add a comment |Â
This is indeed similar to my simulation. Thanks for the detailed answer!
– Shmuel Levinson
18 mins ago
This is indeed similar to my simulation. Thanks for the detailed answer!
– Shmuel Levinson
18 mins ago
This is indeed similar to my simulation. Thanks for the detailed answer!
– Shmuel Levinson
18 mins ago
add a comment |Â
up vote
1
down vote
Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.
Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.
The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.
Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:
weight = 1
for cell_population in cells:
weight *= math.factorial(cell_population)
At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.
add a comment |Â
up vote
1
down vote
Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.
Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.
The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.
Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:
weight = 1
for cell_population in cells:
weight *= math.factorial(cell_population)
At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.
Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.
The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.
Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:
weight = 1
for cell_population in cells:
weight *= math.factorial(cell_population)
At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.
Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.
Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.
The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.
Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:
weight = 1
for cell_population in cells:
weight *= math.factorial(cell_population)
At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.
answered 31 mins ago
Chris Culter
19.3k43280
19.3k43280
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2975845%2fwhy-a-simulation-of-a-probability-experiment-is-off-by-a-factor-of-10%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
– Alexey
14 mins ago
If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
– Alexey
12 mins ago