Iyfjky

Question

Let's say we have two biased coins C1 and C2 both having different probability of turning head.

We toss C1 n1 times and get H1 heads, C2 n2 times and get H2 heads. And we find that the ratio of heads for one coin is higher than the other.

What is the probability with which we can say that one coin is better than the other? (better here means higher actual probability of turning head).

Isn't this a hypothesis testing problem where you have to test whether the estimates of probability of turning up heads for the coins are different. — Sep 7 at 10:33
If you want a probability that one coin is better than another, you can do that with a Bayesian approach; all you need is a prior distribution on the probability. It's a fairly straightforward calculation. — Sep 7 at 12:49
There is no way by which this data is gonna give you a probability that one coin is better than the other. You need additional info/assumptions about a prior distribution for how coins are distributed. Say in a Bayesian approach then the result will differ a lot based on your prior assumptions. For instance if those coins are regular coins than they are (a priori) very likely to be equal and you would need a large amount of trials to find decent probability that one is 'better' than the other (because the probability is high that you got two similar coins that accidentally test different) — 2 days ago
@MartijnWeterings Why can I not use the uniform prior for the coins distribution as in my answer below? — yesterday
In that way it is stated correctly. It is a 'believe' and this is not so easily equated to a 'probability'. — 21 hours ago

score 6 · Accepted Answer · 2018-09-12 08:09:16Z

It is easy to calculate the probability for making that observation, given the fact the two coins are equal. This can be done by Fishers exact test. Given these observations

$$
beginarray c
&textcoin 1 &textcoin 2 \
hline
textheads &H_1 &H_2\
hline
texttails &n_1-H_1 &n_2-H_2\endarray
$$

the probability to observe these numbers while the coins are equal, or in short p-value, is given by
$$
p = frac(H_1+H_2)!(n_1+n_2-H_1-H_2)!n_1!n_2!H_1!H_2!(n_1-H_1)!(n_2-H_2)!(n_1+n_2)!.
$$

But what you are asking for is the probability that one coin is better.
Since we argue about a believe on how biased the coins are we have to use a Bayesian approach to calculate the result. Please note, that in Bayesian inference the term belief is modeled as probability and the two terms are used interchangeably (s. Bayesian probability). We call the probability that count $i$ tosses heads $p_i$. The posterior distribution after observation, for this $p_i$ is given by Bayes' theorem:
$$
f(p_i|H_i,n_i)= fracp_i,n_i)f(p_i)f(n_i,H_i)
$$
The probability density function (pdf) $f(H_i|p_i,n_i)$ is given by the Binomial probability, since the individual tries are Bernoulli experiments:
$$
f(H_i|p_i,n_i) = binomn_iH_ip_i^H_i(1-p_i)^n_i-H_i
$$
I assume the prior knowledge on $f(p_i)$ is that $p_i$ could lie anywhere between $0$ and $1$ with equal probability, hence $f(p_i) = 1$. So the nominator is $f(H_i|p_i,n_i)f(p_i)= f(H_i|p_i,n_i)$.

In Order to calculate $f(n_i,H_i)$ we use the fact that the integral over a pdf has to be one $int_0^1f(p|H_i,n_i)mathrm dp = 1$. So the denominator will be a constant factor to achieve just that. There is a known pdf that differs from the nominator by only a constant factor, which is the beta distribution. Hence
$$
f(p_i|H_i,n_i) = frac1B(H_i+1, n_i-H_i+1)p_i^H_i(1-p_i)^n_i-H_i.
$$

The pdf for the pair of probabilities of independent coins is
$$
f(p_1,p_2|H_1,n_1,H_2,n_2) = f(p_1|H_1,n_1)f(p_2|H_2,n_2).
$$

Now we need to integrate this over the cases in which $p_1>p_2$ in order to find out how probable coin $1$ is better then coin $2$:
$$beginalign
mathbb P(p_1>p_2)
&= int_0^1 int_0^pÃ¢Â€Â˜_1 f(pÃ¢Â€Â˜_1,pÃ¢Â€Â˜_2|H_1,n_1,H_2,n_2)mathrm dpÃ¢Â€Â˜_2 mathrm dpÃ¢Â€Â˜_1\
&=int_0^1 fracB(pÃ¢Â€Â˜_1;H_2+1,n_2-H_2+1)B(H_2+1,n_2-H_2+1)
f(pÃ¢Â€Â˜_1|H_1,n_1)mathrm dpÃ¢Â€Â˜_1
endalign$$

I cannot solve this last integral analytically but one can solve it numerically with a computer after plugging in the numbers. $B(cdot,cdot)$ is the beta function and $B(cdot;cdot,cdot)$ is the incomplete beta function. Note that $mathbb P(p_1=p_2) = 0$ because $p_1$ is a continues variable and never exactly the same as $p_2$.

Concerning the prior assumption on $f(p_i)$ and remarks on it: A good alternative to model many believes is to use a beta distribution $Beta(a_i+1,b_i+1)$. This would lead to a final probability
$$
mathbb P(p_1>p_2)
=int_0^1 fracB(pÃ¢Â€Â˜_1;H_2+1+a_2,n_2-H_2+1+b_2)B(H_2+1+a_2,n_2-H_2+1+b_2)
f(pÃ¢Â€Â˜_1|H_1+a_1,n_1+a_1+b_1)mathrm dpÃ¢Â€Â˜_1.
$$
That way one could model a strong bias towards regular coins by large but equal $a_i$, $b_i$. It would be equivalent to tossing the coin $a_i+b_i$ additional times and receiving $a_i$ heads hence equivalent to just having more data. $a_i + b_i$ is the amount of tosses we would not have to make if we include this prior.

The OP stated that the two coins are both biased to an unknown degree. So I understood all knowledge has to be inferred from the observations. This is why I opted for an uninformative prior that dose not bias the result e.g. towards regular coins.

All information can be conveyed in the form of $(H_i, n_i)$ per coin. The lack of an informative prior only means more observations are needed to decide which coin is better with high probability.

Here is the code in R that provides a function P(n1, H1, n2, H2) $=mathbb P(p_1>p_2)$ using the uniform prior $f(p_i)=1$:

mp <- function(p1, n1, H1, n2, H2) 
 f1 <- pbeta(p1, H2 + 1, n2 - H2 + 1)
 f2 <- dbeta(p1, H1 + 1, n1 - H1 + 1)
 return(f1 * f2)


P <- function(n1, H1, n2, H2) 
 return(integrate(mp, 0, 1, n1, H1, n2, H2))

You can draw $P(p_1>p_2)$ for different experimental results and fixed $n_1$, $n_2$ e.g. $n_1=n_2=4$ with this code sniped:

library(lattice)
n1 <- 4
n2 <- 4
heads <- expand.grid(H1 = 0:n1, H2 = 0:n2)
heads$P <- apply(heads, 1, function(H) P(n1, H[1], n2, H[2])$value)
levelplot(P ~ H1 + H2, heads, main = "P(p1 > p2)")

P(p1 > p2) for n1=n2=4

You may need to install.packages("lattice") first.

One can see, that even with the uniform prior and a small sample size, the probability or believe that one coin is better can become quite solid, when $H_1$ and $H_2$ differ enough. An even smaller relative difference is needed if $n_1$ and $n_2$ are even larger. Here is a plot for $n_1=100$ and $n_2=200$:

enter image description here

Martijn Weterings suggested to calculate the posterior probability distribution for the difference between $p_1$ and $p_2$. This can be done by integrating the pdf of the pair over the set $S(d)=p_1-p_2$:
$$beginalign
f(d|H_1,n_1,H_2,n_2)
&= int_S(d)f(p_1,p_2|H_1,n_1,H_2,n_2) mathrm dgamma\
&= int_0^1-d f(p,p+d|H_1,n_1,H_2,n_2) mathrm dp + int_d^1 f(p,p-d|H_1,n_1,H_2,n_2) mathrm dp\
endalign$$

Again, not an integral I can solve analytically but the R code would be:

d1 <- function(p, d, n1, H1, n2, H2) 
 f1 <- dbeta(p, H1 + 1, n1 - H1 + 1)
 f2 <- dbeta(p + d, H2 + 1, n2 - H2 + 1)
 return(f1 * f2)


d2 <- function(p, d, n1, H1, n2, H2) 
 f1 <- dbeta(p, H1 + 1, n1 - H1 + 1)
 f2 <- dbeta(p - d, H2 + 1, n2 - H2 + 1)
 return(f1 * f2)


fd <- function(d, n1, H1, n2, H2) 
 if (d==1) return(0)
 s1 = integrate(d1, 0, 1-d, d, n1, H1, n2, H2)
 s2 = integrate(d2, d, 1, d, n1, H1, n2, H2)
 return(s1$value + s2$value)

I plotted $f(d|n_1,H_1,n_2,H_2)$ for $n_1=4$, $H_1=3$, $n_2=4$ and all values of $H_2$:

n1 <- 4
n2 <- 4
H1 <- 3
d <- seq(0, 1, length = 500)

get_f <- function(H2) sapply(d, fd, n1, H1, n2, H2)
dat <- sapply(0:n2, get_f)

matplot(d, dat, type = "l", ylab = "Density",
 main = "f(d | 4, 3, 4, H2)")
legend("topright", legend = paste("H2 =", 0:n2),
 col = 1:(n2 + 1), pch = "-")

enter image description here

You can calculate the probability of $|p_1-p_2|$ to be above a value $d$ by integrate(fd, d, 1, n1, H1, n2, H2). Mind that the double application of the numerical integral comes with some numerical error. E.g. integrate(fd, 0, 1, n1, H1, n2, H2) should always equal $1$ since $d$ always takes a value between $0$ and $1$. But the result often deviates slightly.

I am sorry for my bad notation Ã°ÂŸÂ˜Â… but you do not need to plug in a fixed value for $p_1$. The (now changed) $pÃ¢Â€Â˜_1$ in the integral is the variable you integrate over. Just like you can integrate $int_0^1 x^2 mathrm dx$ without having a fixed value for $x$. — 2 days ago
Fisher's exact test is more specifically about the hypothesis that the coins have the same probability and the marginal totals are fixed. This is not the case in this coin problem. If you do the test again then you could observe some other total amount of heads. — 2 days ago
@ThirupathiThangavel the problem with Fisher's test is about the not fixed marginal totals. The exact test model assumes that the probability of heads are the same and fixed, but also marginals are fixed before the experiment. That second part is not the case. This gives a different conditional probability for extreme values. Overall the Fisher test will be conservative. The probability of the result given the TRUE hypothesis (ie. fixed and similar probability for heads, but not necessarily fixed marginal totals) is smaller than calculated (you get higher p-values). — 2 days ago

RLave 3636 · Answer 2 · 2018-09-07 12:17:34Z

I've made a numerical simulation with R, probably you're looking for an analytical answer, but I thought this could be interesting to share.

set.seed(123)
# coin 1
N1 = 20
theta1 = 0.7

toss1 <- rbinom(n = N1, size = 1, prob = theta1)

# coin 2
N2 = 25
theta2 = 0.5

toss2 <- rbinom(n = N2, size = 1, prob = theta2)

# frequency
sum(toss1)/N1 # [1] 0.65
sum(toss2)/N2 # [1] 0.52

In this first code, I simply simulate two coin toss. Here you can see of course that it theta1 > theta2, then of course the frequency of H1 will be higher than H2. Note the different N1,N2 sizes.

Let's see what we can do with different thetas. Note the code is not optimal. At all.

simulation <- function(N1, N2, theta1, theta2, nsim = 100) 
 count1 <- count2 <- 0

 for (i in 1:nsim) 
 toss1 <- rbinom(n = N1, size = 1, prob = theta1)
 toss2 <- rbinom(n = N2, size = 1, prob = theta2)

 if (sum(toss1)/N1 > sum(toss2)/N2) count1 = count1 + 1 
 #if (sum(toss1)/N1 < sum(toss2)/N2) count2 = count2 + 1 
 

 count1/nsim


set.seed(123)
simulation(20, 25, 0.7, 0.5, 100)
#[1] 0.93

So 0.93 is the frequency of times (out of a 100) that the first coin had more heads. This seems ok, looking at theta1 and theta2 used.

Let's see with two vector of thetas.

theta1_v <- seq(from = 0.1, to = 0.9, by = 0.1)
theta2_v <- seq(from = 0.9, to = 0.1, by = -0.1)

res_v <- c()
for (i in 1:length(theta1_v)) 

 res <- simulation(1000, 1500, theta1_v[i], theta2_v[i], 100)
 res_v[i] <- res



plot(theta1_v, res_v, type = "l")

Remember that res_v are the frequencies where H1 > H2, out of 100 simulations.

enter image description here

So as theta1 increases, then the probability of H1 being higher increases, of course.

I've done some other simulations and it seems that the sizes N1,N2 are less important.

If you're familiar with R you can use this code to shed some light on the problem. I'm aware this is not a complete analysis, and it can be improved.

Interesting how res_v changes continuously when the thetas meet. I understood the question as it was asking about the coins intrinsic bias after making just a single observation. You seem to answer what observations one would make after knowing the bias. — Sep 7 at 13:30

score 6 · Accepted Answer · 2018-09-12 08:09:16Z