Why is a simulation of a probability experiment off by a factor of 10?

up vote
9
down vote

favorite

From a university homework assignment:

There are $8$ numbered cells and $12$ indistinct balls. All $12$ balls are randomly divided between all of the $8$ cells. What is the probability that there is not a single empty cell ($i.e.$ each cell has at least $1$ ball)?

The answer is $largefracbinom117binom197$ which is about $0.0065$. I reached this result independently, and it was confirmed by the official homework solution of the university.

A friend of mine and I independently wrote Python simulations that run the experiment many times (tested up to $1,000,000$). We used both Pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around $0.09$ which is a factor of $10$ or even a bit more off from the expected theoretical result.

Have we made some wrong assumptions?
Any ideas for this discrepancy?

P.S.: Here is the Python code that I wrote, and maybe there is some faulty logic there.

def run_test():
 global count, N

 def run_experiment(n_balls, n_cells, offset):
 cells = [0] * n_cells
 # toss balls randomly to cells:
 for j in range(n_balls):
 cells[random.randrange(0, n_cells)] += 1
 # cells[int(lines[offset + j])] += 1
 cells = sorted(cells)
 # print(cells)

 # check if there is an empty cell. if so return 0, otherwise 1:
 if cells[0] == 0:
 return 0
 return 1

 count = 0
 N = 1000000
 offset = 0
 N_CELLS = 8
 N_BALLS = 12
 # iterate experiment
 for i in range(N):
 result = run_experiment(N_BALLS, N_CELLS, offset=offset)
 count += result
 offset += N_CELLS

 print("probability:", count, "/", N, "(~", count / N, ")")

edited 13 mins ago

Chris Culter

19.4k43280

asked 13 hours ago

Shmuel Levinson

494

1

I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
â€“Â Alexey
12 hours ago

If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
â€“Â Alexey
12 hours ago

1

Besides the distinguishability of the balls is how they are distributed between the cells. You can either assign a cell to each ball, uniformly at random, independently of all the other balls, or you can look at all the different distinguishable ball distributions and you pick one of those uniformly at random. That's the difference between your two approaches.
â€“Â Arthur
11 hours ago

4

"Have we made some wrong assumptions?" Hard to tell, since you have shown neither the code that you have written nor specified the assumptions that you have made. Without mind-reading, this question is impossible to answer. If your question involves your code, you should show the relevant code in the question itself.
â€“Â John Coleman
11 hours ago

4

It seems the book solution assumes that all the ways to distribute the balls into the cells are equally likely--for example a distribution of (12,0,0,0,0,0,0) is just as likely as (2,2,2,2,2,2,0,0). This is not consistent with a model that says each ball is dropped into a cell chosen independently and randomly, with each cell equally likely to be chosen, which I think is more realistic.
â€“Â awkward
8 hours ago

Â |Â
show 5 more comments

up vote
9
down vote

favorite

From a university homework assignment:

The answer is $largefracbinom117binom197$ which is about $0.0065$. I reached this result independently, and it was confirmed by the official homework solution of the university.

Have we made some wrong assumptions?
Any ideas for this discrepancy?

P.S.: Here is the Python code that I wrote, and maybe there is some faulty logic there.

def run_test():
 global count, N

 def run_experiment(n_balls, n_cells, offset):
 cells = [0] * n_cells
 # toss balls randomly to cells:
 for j in range(n_balls):
 cells[random.randrange(0, n_cells)] += 1
 # cells[int(lines[offset + j])] += 1
 cells = sorted(cells)
 # print(cells)

 # check if there is an empty cell. if so return 0, otherwise 1:
 if cells[0] == 0:
 return 0
 return 1

 count = 0
 N = 1000000
 offset = 0
 N_CELLS = 8
 N_BALLS = 12
 # iterate experiment
 for i in range(N):
 result = run_experiment(N_BALLS, N_CELLS, offset=offset)
 count += result
 offset += N_CELLS

 print("probability:", count, "/", N, "(~", count / N, ")")

edited 13 mins ago

Chris Culter

19.4k43280

asked 13 hours ago

Shmuel Levinson

494

1

I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
â€“Â Alexey
12 hours ago

If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
â€“Â Alexey
12 hours ago

1

Besides the distinguishability of the balls is how they are distributed between the cells. You can either assign a cell to each ball, uniformly at random, independently of all the other balls, or you can look at all the different distinguishable ball distributions and you pick one of those uniformly at random. That's the difference between your two approaches.
â€“Â Arthur
11 hours ago

4

"Have we made some wrong assumptions?" Hard to tell, since you have shown neither the code that you have written nor specified the assumptions that you have made. Without mind-reading, this question is impossible to answer. If your question involves your code, you should show the relevant code in the question itself.
â€“Â John Coleman
11 hours ago

4

It seems the book solution assumes that all the ways to distribute the balls into the cells are equally likely--for example a distribution of (12,0,0,0,0,0,0) is just as likely as (2,2,2,2,2,2,0,0). This is not consistent with a model that says each ball is dropped into a cell chosen independently and randomly, with each cell equally likely to be chosen, which I think is more realistic.
â€“Â awkward
8 hours ago

Â |Â
show 5 more comments

up vote
9
down vote

favorite

From a university homework assignment:

The answer is $largefracbinom117binom197$ which is about $0.0065$. I reached this result independently, and it was confirmed by the official homework solution of the university.

Have we made some wrong assumptions?
Any ideas for this discrepancy?

P.S.: Here is the Python code that I wrote, and maybe there is some faulty logic there.

def run_test():
 global count, N

 def run_experiment(n_balls, n_cells, offset):
 cells = [0] * n_cells
 # toss balls randomly to cells:
 for j in range(n_balls):
 cells[random.randrange(0, n_cells)] += 1
 # cells[int(lines[offset + j])] += 1
 cells = sorted(cells)
 # print(cells)

 # check if there is an empty cell. if so return 0, otherwise 1:
 if cells[0] == 0:
 return 0
 return 1

 count = 0
 N = 1000000
 offset = 0
 N_CELLS = 8
 N_BALLS = 12
 # iterate experiment
 for i in range(N):
 result = run_experiment(N_BALLS, N_CELLS, offset=offset)
 count += result
 offset += N_CELLS

 print("probability:", count, "/", N, "(~", count / N, ")")

edited 13 mins ago

Chris Culter

19.4k43280

asked 13 hours ago

Shmuel Levinson

494

From a university homework assignment:

The answer is $largefracbinom117binom197$ which is about $0.0065$. I reached this result independently, and it was confirmed by the official homework solution of the university.

Have we made some wrong assumptions?
Any ideas for this discrepancy?

P.S.: Here is the Python code that I wrote, and maybe there is some faulty logic there.

def run_test():
 global count, N

 def run_experiment(n_balls, n_cells, offset):
 cells = [0] * n_cells
 # toss balls randomly to cells:
 for j in range(n_balls):
 cells[random.randrange(0, n_cells)] += 1
 # cells[int(lines[offset + j])] += 1
 cells = sorted(cells)
 # print(cells)

 # check if there is an empty cell. if so return 0, otherwise 1:
 if cells[0] == 0:
 return 0
 return 1

 count = 0
 N = 1000000
 offset = 0
 N_CELLS = 8
 N_BALLS = 12
 # iterate experiment
 for i in range(N):
 result = run_experiment(N_BALLS, N_CELLS, offset=offset)
 count += result
 offset += N_CELLS

 print("probability:", count, "/", N, "(~", count / N, ")")

probability simulation python

edited 13 mins ago

Chris Culter

19.4k43280

asked 13 hours ago

Shmuel Levinson

494

edited 13 mins ago

Chris Culter

19.4k43280

asked 13 hours ago

Shmuel Levinson

494

edited 13 mins ago

Chris Culter

19.4k43280

edited 13 mins ago

Chris Culter

19.4k43280

edited 13 mins ago

Chris Culter

19.4k43280

asked 13 hours ago

Shmuel Levinson

494

asked 13 hours ago

Shmuel Levinson

494

asked 13 hours ago

Shmuel Levinson

494

1

I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
â€“Â Alexey
12 hours ago

If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
â€“Â Alexey
12 hours ago

1

Besides the distinguishability of the balls is how they are distributed between the cells. You can either assign a cell to each ball, uniformly at random, independently of all the other balls, or you can look at all the different distinguishable ball distributions and you pick one of those uniformly at random. That's the difference between your two approaches.
â€“Â Arthur
11 hours ago

4

"Have we made some wrong assumptions?" Hard to tell, since you have shown neither the code that you have written nor specified the assumptions that you have made. Without mind-reading, this question is impossible to answer. If your question involves your code, you should show the relevant code in the question itself.
â€“Â John Coleman
11 hours ago

4

It seems the book solution assumes that all the ways to distribute the balls into the cells are equally likely--for example a distribution of (12,0,0,0,0,0,0) is just as likely as (2,2,2,2,2,2,0,0). This is not consistent with a model that says each ball is dropped into a cell chosen independently and randomly, with each cell equally likely to be chosen, which I think is more realistic.
â€“Â awkward
8 hours ago

Â |Â
show 5 more comments

1

I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
â€“Â Alexey
12 hours ago

If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
â€“Â Alexey
12 hours ago

1

Besides the distinguishability of the balls is how they are distributed between the cells. You can either assign a cell to each ball, uniformly at random, independently of all the other balls, or you can look at all the different distinguishable ball distributions and you pick one of those uniformly at random. That's the difference between your two approaches.
â€“Â Arthur
11 hours ago

4

"Have we made some wrong assumptions?" Hard to tell, since you have shown neither the code that you have written nor specified the assumptions that you have made. Without mind-reading, this question is impossible to answer. If your question involves your code, you should show the relevant code in the question itself.
â€“Â John Coleman
11 hours ago

4

It seems the book solution assumes that all the ways to distribute the balls into the cells are equally likely--for example a distribution of (12,0,0,0,0,0,0) is just as likely as (2,2,2,2,2,2,0,0). This is not consistent with a model that says each ball is dropped into a cell chosen independently and randomly, with each cell equally likely to be chosen, which I think is more realistic.
â€“Â awkward
8 hours ago

I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula?
â€“Â Alexey
12 hours ago

If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment...
â€“Â Alexey
12 hours ago

Besides the distinguishability of the balls is how they are distributed between the cells. You can either assign a cell to each ball, uniformly at random, independently of all the other balls, or you can look at all the different distinguishable ball distributions and you pick one of those uniformly at random. That's the difference between your two approaches.
â€“Â Arthur
11 hours ago

"Have we made some wrong assumptions?" Hard to tell, since you have shown neither the code that you have written nor specified the assumptions that you have made. Without mind-reading, this question is impossible to answer. If your question involves your code, you should show the relevant code in the question itself.
â€“Â John Coleman
11 hours ago

It seems the book solution assumes that all the ways to distribute the balls into the cells are equally likely--for example a distribution of (12,0,0,0,0,0,0) is just as likely as (2,2,2,2,2,2,0,0). This is not consistent with a model that says each ball is dropped into a cell chosen independently and randomly, with each cell equally likely to be chosen, which I think is more realistic.
â€“Â awkward
8 hours ago

Â |Â
show 5 more comments

5 Answers
5

active

oldest

votes

up vote
14
down vote

In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $frac18^12$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $frac119 choose 7$ over a million times more likely but difficult to simulate

If the balls are distinguishable, the probability all eight boxes are full is $$frac8! , S_2(12,8)8^12$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?

If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside BoseÃ¢Â€Â“Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $0,1,2,3,ldots,18$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive

edited 12 hours ago

answered 12 hours ago

Henry

95.7k473153

1

This is indeed similar to my simulation. Thanks for the detailed answer!
â€“Â Shmuel Levinson
12 hours ago

3

Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult
â€“Â Carl Witthoft
6 hours ago

1

I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell?
â€“Â JimmyJames
1 hour ago

1

@JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities?
â€“Â Henry
1 hour ago

1

It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails.
â€“Â JimmyJames
56 mins ago

Â |Â
show 2 more comments

up vote
9
down vote

Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.

Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.

The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.

Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:

weight = 1
for cell_population in cells:
 weight *= math.factorial(cell_population)

At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.

answered 12 hours ago

Chris Culter

19.4k43280

add a commentÂ |Â

up vote
2
down vote

The original problem is posed, so far as I can tell, to show the difference between combinations and permutations. In nature, there is no such thing as indistinguishable balls. Semi-infinite tests (e.g. Las Vegas) have shown this to be true.

Now, if the problem really wants you to use "indistinguishable" balls for the purposes of solving the problem, then yes, you need to use combinations and not permutations when calculating all the ways the indistinguishable balls are placed into the containers.
And of course, you need to use permutations for the numbered balls, as they are distinguishable from each other and from the collection of indistinguishable balls.

Now, I believe that Chris Culter's calculations reflect this difference. Whether your Python code does this correctly we can't say until we see the code.

answered 6 hours ago

Carl Witthoft

30218

1

pastebin.com/5mKBxfbM
â€“Â Shmuel Levinson
3 hours ago

add a commentÂ |Â

up vote
0
down vote

Thanks to the answers here I figured out my mistake in the simulation. It simply didn't create a uniform distribution. Essentially in my simulation the balls were numbered

answered 2 hours ago

Shmuel Levinson

494

I'm just curious here. Can you explain why that mattered? The task is simply to count the number of balls randomly assigned to a cell. Where did the number on the ball come in?
â€“Â JimmyJames
42 mins ago

This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
â€“Â Scientifica
30 mins ago

Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
â€“Â max_zorn
17 mins ago

add a commentÂ |Â

up vote
-1
down vote

My answer is intended to add to the existing ones, which are already good.

A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to 1,000,000) ... [and they are] ... off from the expected theoretical result. ... Have we made some wrong assumptions? Any ideas for this discrepancy?

Remember that the theoretical result and the experimental result are generally not going to be the same. The law of large numbers basically says that, as you run more and more experiments, the experimental results will tend towards the theoretical results.

From your description, it sounds like, in all cases, you only ran "up to" (i.e. a maximum of) 1M simulations. Did you try larger numbers, such as 2M or even up to 10M or 25M?

If we can assume for a moment that there're no issues with the simulation code, then increasing the number of experiments should get you closer to your expected theoretical results.

We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around 0.09 which is a factor of 10

The results may be similar because, from your description, you're only running up to 1M experiments, but don't appear to have tried going beyond that limit.

My response is also assuming that the use of Python's PRNGs has been done according to the documentation, as not doing so can also have an impact in the validity of your simulation.

That said, note that to verify/review the validity of your code, SO or CodeReview might be a more appropriate site than this one.

answered 4 hours ago

code_dredd

1013

4

If the true probability is 0.0065, then over a million trials the expected number of successes is 6500. If you have a Poisson process with $lambda= 6500$, then the standard deviation is the square root of that, or slightly more than 80. The z-score for 90000 is over 1000. The idea that more trials will make this effect disappear is absurd.
â€“Â Acccumulation
3 hours ago

Indeed the difference between 100, 1,000 or 1,000,000 iterations was non-existent
â€“Â Shmuel Levinson
2 hours ago

Fair enough. I'll remove it when I get a few more minutes then.
â€“Â code_dredd
41 mins ago

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2975845%2fwhy-is-a-simulation-of-a-probability-experiment-off-by-a-factor-of-10%23new-answer', 'question_page');

);

Post as a guest

Name

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

up vote
14
down vote

edited 12 hours ago

answered 12 hours ago

Henry

95.7k473153

1

This is indeed similar to my simulation. Thanks for the detailed answer!
â€“Â Shmuel Levinson
12 hours ago

3

Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult
â€“Â Carl Witthoft
6 hours ago

1

I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell?
â€“Â JimmyJames
1 hour ago

1

@JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities?
â€“Â Henry
1 hour ago

1

It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails.
â€“Â JimmyJames
56 mins ago

Â |Â
show 2 more comments

up vote
14
down vote

edited 12 hours ago

answered 12 hours ago

Henry

95.7k473153

1

This is indeed similar to my simulation. Thanks for the detailed answer!
â€“Â Shmuel Levinson
12 hours ago

3

Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult
â€“Â Carl Witthoft
6 hours ago

1

I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell?
â€“Â JimmyJames
1 hour ago

1

@JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities?
â€“Â Henry
1 hour ago

1

It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails.
â€“Â JimmyJames
56 mins ago

Â |Â
show 2 more comments

up vote
14
down vote

edited 12 hours ago

answered 12 hours ago

Henry

95.7k473153

edited 12 hours ago

answered 12 hours ago

Henry

95.7k473153

edited 12 hours ago

answered 12 hours ago

Henry

95.7k473153

answered 12 hours ago

Henry

95.7k473153

answered 12 hours ago

Henry

95.7k473153

1

This is indeed similar to my simulation. Thanks for the detailed answer!
â€“Â Shmuel Levinson
12 hours ago

3

Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult
â€“Â Carl Witthoft
6 hours ago

1

I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell?
â€“Â JimmyJames
1 hour ago

1

@JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities?
â€“Â Henry
1 hour ago

1

It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails.
â€“Â JimmyJames
56 mins ago

Â |Â
show 2 more comments

1

This is indeed similar to my simulation. Thanks for the detailed answer!
â€“Â Shmuel Levinson
12 hours ago

3

Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult
â€“Â Carl Witthoft
6 hours ago

1

I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell?
â€“Â JimmyJames
1 hour ago

1

@JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities?
â€“Â Henry
1 hour ago

1

It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails.
â€“Â JimmyJames
56 mins ago

This is indeed similar to my simulation. Thanks for the detailed answer!
â€“Â Shmuel Levinson
12 hours ago

Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult
â€“Â Carl Witthoft
6 hours ago

I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell?
â€“Â JimmyJames
1 hour ago

@JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities?
â€“Â Henry
1 hour ago

It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails.
â€“Â JimmyJames
56 mins ago

Â |Â
show 2 more comments

up vote
9
down vote

Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.

Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.

weight = 1
for cell_population in cells:
 weight *= math.factorial(cell_population)

At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.

answered 12 hours ago

Chris Culter

19.4k43280

add a commentÂ |Â

up vote
9
down vote

Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.

Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.

weight = 1
for cell_population in cells:
 weight *= math.factorial(cell_population)

At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.

answered 12 hours ago

Chris Culter

19.4k43280

add a commentÂ |Â

up vote
9
down vote

Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.

Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.

weight = 1
for cell_population in cells:
 weight *= math.factorial(cell_population)

At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.

answered 12 hours ago

Chris Culter

19.4k43280

Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^12approx7times10^10$ elements.

Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has $19choose7approx 5times10^4$ elements.

weight = 1
for cell_population in cells:
 weight *= math.factorial(cell_population)

At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.

answered 12 hours ago

Chris Culter

19.4k43280

answered 12 hours ago

Chris Culter

19.4k43280

answered 12 hours ago

Chris Culter

19.4k43280

answered 12 hours ago

Chris Culter

19.4k43280

add a commentÂ |Â

up vote
2
down vote

Now, I believe that Chris Culter's calculations reflect this difference. Whether your Python code does this correctly we can't say until we see the code.

answered 6 hours ago

Carl Witthoft

30218

1

pastebin.com/5mKBxfbM
â€“Â Shmuel Levinson
3 hours ago

add a commentÂ |Â

up vote
2
down vote

Now, I believe that Chris Culter's calculations reflect this difference. Whether your Python code does this correctly we can't say until we see the code.

answered 6 hours ago

Carl Witthoft

30218

1

pastebin.com/5mKBxfbM
â€“Â Shmuel Levinson
3 hours ago

add a commentÂ |Â

up vote
2
down vote

Now, I believe that Chris Culter's calculations reflect this difference. Whether your Python code does this correctly we can't say until we see the code.

answered 6 hours ago

Carl Witthoft

30218

Now, I believe that Chris Culter's calculations reflect this difference. Whether your Python code does this correctly we can't say until we see the code.

answered 6 hours ago

Carl Witthoft

30218

answered 6 hours ago

Carl Witthoft

30218

answered 6 hours ago

Carl Witthoft

30218

answered 6 hours ago

Carl Witthoft

30218

1

pastebin.com/5mKBxfbM
â€“Â Shmuel Levinson
3 hours ago

add a commentÂ |Â

1

pastebin.com/5mKBxfbM
â€“Â Shmuel Levinson
3 hours ago

pastebin.com/5mKBxfbM
â€“Â Shmuel Levinson
3 hours ago

add a commentÂ |Â

up vote
0
down vote

Thanks to the answers here I figured out my mistake in the simulation. It simply didn't create a uniform distribution. Essentially in my simulation the balls were numbered

answered 2 hours ago

Shmuel Levinson

494

I'm just curious here. Can you explain why that mattered? The task is simply to count the number of balls randomly assigned to a cell. Where did the number on the ball come in?
â€“Â JimmyJames
42 mins ago

This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
â€“Â Scientifica
30 mins ago

Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
â€“Â max_zorn
17 mins ago

add a commentÂ |Â

up vote
0
down vote

Thanks to the answers here I figured out my mistake in the simulation. It simply didn't create a uniform distribution. Essentially in my simulation the balls were numbered

answered 2 hours ago

Shmuel Levinson

494

I'm just curious here. Can you explain why that mattered? The task is simply to count the number of balls randomly assigned to a cell. Where did the number on the ball come in?
â€“Â JimmyJames
42 mins ago

This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
â€“Â Scientifica
30 mins ago

Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
â€“Â max_zorn
17 mins ago

add a commentÂ |Â

up vote
0
down vote

Thanks to the answers here I figured out my mistake in the simulation. It simply didn't create a uniform distribution. Essentially in my simulation the balls were numbered

answered 2 hours ago

Shmuel Levinson

494

Thanks to the answers here I figured out my mistake in the simulation. It simply didn't create a uniform distribution. Essentially in my simulation the balls were numbered

answered 2 hours ago

Shmuel Levinson

494

answered 2 hours ago

Shmuel Levinson

494

answered 2 hours ago

Shmuel Levinson

494

answered 2 hours ago

Shmuel Levinson

494

I'm just curious here. Can you explain why that mattered? The task is simply to count the number of balls randomly assigned to a cell. Where did the number on the ball come in?
â€“Â JimmyJames
42 mins ago

This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
â€“Â Scientifica
30 mins ago

Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
â€“Â max_zorn
17 mins ago

add a commentÂ |Â

I'm just curious here. Can you explain why that mattered? The task is simply to count the number of balls randomly assigned to a cell. Where did the number on the ball come in?
â€“Â JimmyJames
42 mins ago

This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
â€“Â Scientifica
30 mins ago

Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
â€“Â max_zorn
17 mins ago

I'm just curious here. Can you explain why that mattered? The task is simply to count the number of balls randomly assigned to a cell. Where did the number on the ball come in?
â€“Â JimmyJames
42 mins ago

This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
â€“Â Scientifica
30 mins ago

Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
â€“Â max_zorn
17 mins ago

add a commentÂ |Â

up vote
-1
down vote

My answer is intended to add to the existing ones, which are already good.

A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to 1,000,000) ... [and they are] ... off from the expected theoretical result. ... Have we made some wrong assumptions? Any ideas for this discrepancy?

From your description, it sounds like, in all cases, you only ran "up to" (i.e. a maximum of) 1M simulations. Did you try larger numbers, such as 2M or even up to 10M or 25M?

If we can assume for a moment that there're no issues with the simulation code, then increasing the number of experiments should get you closer to your expected theoretical results.

We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around 0.09 which is a factor of 10

The results may be similar because, from your description, you're only running up to 1M experiments, but don't appear to have tried going beyond that limit.

My response is also assuming that the use of Python's PRNGs has been done according to the documentation, as not doing so can also have an impact in the validity of your simulation.

That said, note that to verify/review the validity of your code, SO or CodeReview might be a more appropriate site than this one.

answered 4 hours ago

code_dredd

1013

4

If the true probability is 0.0065, then over a million trials the expected number of successes is 6500. If you have a Poisson process with $lambda= 6500$, then the standard deviation is the square root of that, or slightly more than 80. The z-score for 90000 is over 1000. The idea that more trials will make this effect disappear is absurd.
â€“Â Acccumulation
3 hours ago

Indeed the difference between 100, 1,000 or 1,000,000 iterations was non-existent
â€“Â Shmuel Levinson
2 hours ago

Fair enough. I'll remove it when I get a few more minutes then.
â€“Â code_dredd
41 mins ago

add a commentÂ |Â

up vote
-1
down vote

My answer is intended to add to the existing ones, which are already good.

A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to 1,000,000) ... [and they are] ... off from the expected theoretical result. ... Have we made some wrong assumptions? Any ideas for this discrepancy?

From your description, it sounds like, in all cases, you only ran "up to" (i.e. a maximum of) 1M simulations. Did you try larger numbers, such as 2M or even up to 10M or 25M?

If we can assume for a moment that there're no issues with the simulation code, then increasing the number of experiments should get you closer to your expected theoretical results.

We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around 0.09 which is a factor of 10

The results may be similar because, from your description, you're only running up to 1M experiments, but don't appear to have tried going beyond that limit.

My response is also assuming that the use of Python's PRNGs has been done according to the documentation, as not doing so can also have an impact in the validity of your simulation.

That said, note that to verify/review the validity of your code, SO or CodeReview might be a more appropriate site than this one.

answered 4 hours ago

code_dredd

1013

4

If the true probability is 0.0065, then over a million trials the expected number of successes is 6500. If you have a Poisson process with $lambda= 6500$, then the standard deviation is the square root of that, or slightly more than 80. The z-score for 90000 is over 1000. The idea that more trials will make this effect disappear is absurd.
â€“Â Acccumulation
3 hours ago

Indeed the difference between 100, 1,000 or 1,000,000 iterations was non-existent
â€“Â Shmuel Levinson
2 hours ago

Fair enough. I'll remove it when I get a few more minutes then.
â€“Â code_dredd
41 mins ago

add a commentÂ |Â

up vote
-1
down vote

My answer is intended to add to the existing ones, which are already good.

A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to 1,000,000) ... [and they are] ... off from the expected theoretical result. ... Have we made some wrong assumptions? Any ideas for this discrepancy?

From your description, it sounds like, in all cases, you only ran "up to" (i.e. a maximum of) 1M simulations. Did you try larger numbers, such as 2M or even up to 10M or 25M?

If we can assume for a moment that there're no issues with the simulation code, then increasing the number of experiments should get you closer to your expected theoretical results.

We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around 0.09 which is a factor of 10

The results may be similar because, from your description, you're only running up to 1M experiments, but don't appear to have tried going beyond that limit.

My response is also assuming that the use of Python's PRNGs has been done according to the documentation, as not doing so can also have an impact in the validity of your simulation.

That said, note that to verify/review the validity of your code, SO or CodeReview might be a more appropriate site than this one.

answered 4 hours ago

code_dredd

1013

My answer is intended to add to the existing ones, which are already good.

A friend of mine and I independently wrote python simulations that run the experiment many times (tested up to 1,000,000) ... [and they are] ... off from the expected theoretical result. ... Have we made some wrong assumptions? Any ideas for this discrepancy?

From your description, it sounds like, in all cases, you only ran "up to" (i.e. a maximum of) 1M simulations. Did you try larger numbers, such as 2M or even up to 10M or 25M?

If we can assume for a moment that there're no issues with the simulation code, then increasing the number of experiments should get you closer to your expected theoretical results.

We used both pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around 0.09 which is a factor of 10

The results may be similar because, from your description, you're only running up to 1M experiments, but don't appear to have tried going beyond that limit.

My response is also assuming that the use of Python's PRNGs has been done according to the documentation, as not doing so can also have an impact in the validity of your simulation.

That said, note that to verify/review the validity of your code, SO or CodeReview might be a more appropriate site than this one.

answered 4 hours ago

code_dredd

1013

answered 4 hours ago

code_dredd

1013

answered 4 hours ago

code_dredd

1013

answered 4 hours ago

code_dredd

1013

4

If the true probability is 0.0065, then over a million trials the expected number of successes is 6500. If you have a Poisson process with $lambda= 6500$, then the standard deviation is the square root of that, or slightly more than 80. The z-score for 90000 is over 1000. The idea that more trials will make this effect disappear is absurd.
â€“Â Acccumulation
3 hours ago

Indeed the difference between 100, 1,000 or 1,000,000 iterations was non-existent
â€“Â Shmuel Levinson
2 hours ago

Fair enough. I'll remove it when I get a few more minutes then.
â€“Â code_dredd
41 mins ago

add a commentÂ |Â

4

If the true probability is 0.0065, then over a million trials the expected number of successes is 6500. If you have a Poisson process with $lambda= 6500$, then the standard deviation is the square root of that, or slightly more than 80. The z-score for 90000 is over 1000. The idea that more trials will make this effect disappear is absurd.
â€“Â Acccumulation
3 hours ago

Indeed the difference between 100, 1,000 or 1,000,000 iterations was non-existent
â€“Â Shmuel Levinson
2 hours ago

Fair enough. I'll remove it when I get a few more minutes then.
â€“Â code_dredd
41 mins ago

If the true probability is 0.0065, then over a million trials the expected number of successes is 6500. If you have a Poisson process with $lambda= 6500$, then the standard deviation is the square root of that, or slightly more than 80. The z-score for 90000 is over 1000. The idea that more trials will make this effect disappear is absurd.
â€“Â Acccumulation
3 hours ago

Indeed the difference between 100, 1,000 or 1,000,000 iterations was non-existent
â€“Â Shmuel Levinson
2 hours ago

Fair enough. I'll remove it when I get a few more minutes then.
â€“Â code_dredd
41 mins ago

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky