How do frequentists address this paradox of hypothesis testing?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
Suppose we sample a person from the population. They are a member of US Congress. We define the null hypothesis $H_0$ as "the person is American". We calculate the $p$-value: $P[member of Congress | American] ll 0.05$. Since if the null hypothesis holds, the person is very unlikely to be a member of Congress, we reject the null hypothesis and decide that the person is very likely not an American. This conclusion is obviously very wrong, as all members of Congress are American.
Which assumptions of the hypothesis testing did I violate here? In other words, if I encounter a similar (but more obscure) application where this methodology is also not appropriate, how do I identify it?
hypothesis-testing bayesian frequentist
add a comment |Â
up vote
2
down vote
favorite
Suppose we sample a person from the population. They are a member of US Congress. We define the null hypothesis $H_0$ as "the person is American". We calculate the $p$-value: $P[member of Congress | American] ll 0.05$. Since if the null hypothesis holds, the person is very unlikely to be a member of Congress, we reject the null hypothesis and decide that the person is very likely not an American. This conclusion is obviously very wrong, as all members of Congress are American.
Which assumptions of the hypothesis testing did I violate here? In other words, if I encounter a similar (but more obscure) application where this methodology is also not appropriate, how do I identify it?
hypothesis-testing bayesian frequentist
2
"The person is an American" is not a null hypothesis: it is a prediction about the value of a random variable. It cannot possibly have a p-value. You aren't doing hypothesis testing at all--and since you haven't explained how you obtained your purported "p-value," it isn't evident what you're doing or what your "methodology" might possibly be. Could you edit your post to explain it?
â whuberâ¦
59 mins ago
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
Suppose we sample a person from the population. They are a member of US Congress. We define the null hypothesis $H_0$ as "the person is American". We calculate the $p$-value: $P[member of Congress | American] ll 0.05$. Since if the null hypothesis holds, the person is very unlikely to be a member of Congress, we reject the null hypothesis and decide that the person is very likely not an American. This conclusion is obviously very wrong, as all members of Congress are American.
Which assumptions of the hypothesis testing did I violate here? In other words, if I encounter a similar (but more obscure) application where this methodology is also not appropriate, how do I identify it?
hypothesis-testing bayesian frequentist
Suppose we sample a person from the population. They are a member of US Congress. We define the null hypothesis $H_0$ as "the person is American". We calculate the $p$-value: $P[member of Congress | American] ll 0.05$. Since if the null hypothesis holds, the person is very unlikely to be a member of Congress, we reject the null hypothesis and decide that the person is very likely not an American. This conclusion is obviously very wrong, as all members of Congress are American.
Which assumptions of the hypothesis testing did I violate here? In other words, if I encounter a similar (but more obscure) application where this methodology is also not appropriate, how do I identify it?
hypothesis-testing bayesian frequentist
hypothesis-testing bayesian frequentist
asked 9 hours ago
rinspy
1,863330
1,863330
2
"The person is an American" is not a null hypothesis: it is a prediction about the value of a random variable. It cannot possibly have a p-value. You aren't doing hypothesis testing at all--and since you haven't explained how you obtained your purported "p-value," it isn't evident what you're doing or what your "methodology" might possibly be. Could you edit your post to explain it?
â whuberâ¦
59 mins ago
add a comment |Â
2
"The person is an American" is not a null hypothesis: it is a prediction about the value of a random variable. It cannot possibly have a p-value. You aren't doing hypothesis testing at all--and since you haven't explained how you obtained your purported "p-value," it isn't evident what you're doing or what your "methodology" might possibly be. Could you edit your post to explain it?
â whuberâ¦
59 mins ago
2
2
"The person is an American" is not a null hypothesis: it is a prediction about the value of a random variable. It cannot possibly have a p-value. You aren't doing hypothesis testing at all--and since you haven't explained how you obtained your purported "p-value," it isn't evident what you're doing or what your "methodology" might possibly be. Could you edit your post to explain it?
â whuberâ¦
59 mins ago
"The person is an American" is not a null hypothesis: it is a prediction about the value of a random variable. It cannot possibly have a p-value. You aren't doing hypothesis testing at all--and since you haven't explained how you obtained your purported "p-value," it isn't evident what you're doing or what your "methodology" might possibly be. Could you edit your post to explain it?
â whuberâ¦
59 mins ago
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
5
down vote
Frequentist statistics is meant to make inference on populations using samples, not on individuals. You first define a population (which you have not done), take a sample, and make inference on the population using the sample, taking into account the uncertainty.
You have used your sampled individual as if it were your population, and try to make inference on him. But frequentist statistics do not apply here, you cannot repeat the sampling processs with a population size of 1.
1
But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
â rinspy
4 hours ago
1
In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
â rinspy
4 hours ago
1
The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
â Knarpie
4 hours ago
1
The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
â rinspy
4 hours ago
@Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
â Rohan
3 hours ago
 |Â
show 1 more comment
up vote
5
down vote
To make this test apply to the population we could change the hypotheses slightly to
H0: The sample is drawn from a population in the US
H1: The sample is drawn from a population not in the US
As far as I can tell there's nothing wrong with this hypothesis test. For a hypothesis test with significance level 0.05 (for example) is a test you need that if the null hypothesis is true then the probability that the test will reject it is less than 0.05.
In this example if you think about repeatedly sampling, if the null hypothesis then we will choose people from the US. And out of those people, only a very small fraction (less than 0.05) are expected to be members of congress, so you would only reject the null hypothesis for less than 5% of them.
So if the test is correct, why does it seem so paradoxical? While it technically satisfies the criterion for a hypothesis test, for any fixed significance level we typically want to choose the rejection criterion which maximizes the power of the test - that is it maximizes the probability of rejecting the null hypothesis if it is false. In your case the test is terrible at this, it will never reject the null hypothesis even if it is false.
The paradox depends on the rejection criterion being impossible under the alternative hypothesis or more unlikely than under the alternate hypothesis that under the null. Any such test will have zero or very low power.
2
Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
â rinspy
4 hours ago
Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
â Rohan
3 hours ago
1
I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
â Rohan
3 hours ago
add a comment |Â
up vote
2
down vote
I'd say there's at least two additional problems with your "paradox" (I don't even think it is a valid testing problem):
First, "Member of Congress=1" is an invalid test statistic for your "H0" as it does not measure deviation from the H0. So, a person who is not American would have automatically "Member of Congress=0" which also applies to most Americans. Let me expand on that. What values can the test statistic take? Well 1 if the person is a member of congress AND American and 0 if the person is either American AND not member of congress, OR Non-American. That means that the test statistic can take on TWO distinct values (0, 1) if the null were true! And both values do carry information in favour of the H0 (0 for Americans non-congress members, 1 for American congress members). But one of these values (0) also carries information in favour against the H0. So what does one learn about H0 in case the test statistic is 0 or not 0? Thus the test you describe appears invalid.
Second, the p-value is defined as the probability to observe a test statistic that speaks as much or more strongly against the H0 as the value you have observed in the sample. In other words, it is a quantile of the distribution of the test statistic under the assumption the null were true. I have difficulties to match your p-value, which seems to simply be a conditional probability, to that. But not every conditional probability conditioning on "American" is automatically the correct p-value because for that one would have to work out the correct null distribution of the test statistic that you propose.
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
Frequentist statistics is meant to make inference on populations using samples, not on individuals. You first define a population (which you have not done), take a sample, and make inference on the population using the sample, taking into account the uncertainty.
You have used your sampled individual as if it were your population, and try to make inference on him. But frequentist statistics do not apply here, you cannot repeat the sampling processs with a population size of 1.
1
But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
â rinspy
4 hours ago
1
In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
â rinspy
4 hours ago
1
The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
â Knarpie
4 hours ago
1
The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
â rinspy
4 hours ago
@Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
â Rohan
3 hours ago
 |Â
show 1 more comment
up vote
5
down vote
Frequentist statistics is meant to make inference on populations using samples, not on individuals. You first define a population (which you have not done), take a sample, and make inference on the population using the sample, taking into account the uncertainty.
You have used your sampled individual as if it were your population, and try to make inference on him. But frequentist statistics do not apply here, you cannot repeat the sampling processs with a population size of 1.
1
But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
â rinspy
4 hours ago
1
In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
â rinspy
4 hours ago
1
The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
â Knarpie
4 hours ago
1
The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
â rinspy
4 hours ago
@Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
â Rohan
3 hours ago
 |Â
show 1 more comment
up vote
5
down vote
up vote
5
down vote
Frequentist statistics is meant to make inference on populations using samples, not on individuals. You first define a population (which you have not done), take a sample, and make inference on the population using the sample, taking into account the uncertainty.
You have used your sampled individual as if it were your population, and try to make inference on him. But frequentist statistics do not apply here, you cannot repeat the sampling processs with a population size of 1.
Frequentist statistics is meant to make inference on populations using samples, not on individuals. You first define a population (which you have not done), take a sample, and make inference on the population using the sample, taking into account the uncertainty.
You have used your sampled individual as if it were your population, and try to make inference on him. But frequentist statistics do not apply here, you cannot repeat the sampling processs with a population size of 1.
answered 6 hours ago
Knarpie
1,196418
1,196418
1
But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
â rinspy
4 hours ago
1
In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
â rinspy
4 hours ago
1
The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
â Knarpie
4 hours ago
1
The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
â rinspy
4 hours ago
@Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
â Rohan
3 hours ago
 |Â
show 1 more comment
1
But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
â rinspy
4 hours ago
1
In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
â rinspy
4 hours ago
1
The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
â Knarpie
4 hours ago
1
The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
â rinspy
4 hours ago
@Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
â Rohan
3 hours ago
1
1
But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
â rinspy
4 hours ago
But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
â rinspy
4 hours ago
1
1
In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
â rinspy
4 hours ago
In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
â rinspy
4 hours ago
1
1
The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
â Knarpie
4 hours ago
The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
â Knarpie
4 hours ago
1
1
The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
â rinspy
4 hours ago
The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
â rinspy
4 hours ago
@Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
â Rohan
3 hours ago
@Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
â Rohan
3 hours ago
 |Â
show 1 more comment
up vote
5
down vote
To make this test apply to the population we could change the hypotheses slightly to
H0: The sample is drawn from a population in the US
H1: The sample is drawn from a population not in the US
As far as I can tell there's nothing wrong with this hypothesis test. For a hypothesis test with significance level 0.05 (for example) is a test you need that if the null hypothesis is true then the probability that the test will reject it is less than 0.05.
In this example if you think about repeatedly sampling, if the null hypothesis then we will choose people from the US. And out of those people, only a very small fraction (less than 0.05) are expected to be members of congress, so you would only reject the null hypothesis for less than 5% of them.
So if the test is correct, why does it seem so paradoxical? While it technically satisfies the criterion for a hypothesis test, for any fixed significance level we typically want to choose the rejection criterion which maximizes the power of the test - that is it maximizes the probability of rejecting the null hypothesis if it is false. In your case the test is terrible at this, it will never reject the null hypothesis even if it is false.
The paradox depends on the rejection criterion being impossible under the alternative hypothesis or more unlikely than under the alternate hypothesis that under the null. Any such test will have zero or very low power.
2
Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
â rinspy
4 hours ago
Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
â Rohan
3 hours ago
1
I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
â Rohan
3 hours ago
add a comment |Â
up vote
5
down vote
To make this test apply to the population we could change the hypotheses slightly to
H0: The sample is drawn from a population in the US
H1: The sample is drawn from a population not in the US
As far as I can tell there's nothing wrong with this hypothesis test. For a hypothesis test with significance level 0.05 (for example) is a test you need that if the null hypothesis is true then the probability that the test will reject it is less than 0.05.
In this example if you think about repeatedly sampling, if the null hypothesis then we will choose people from the US. And out of those people, only a very small fraction (less than 0.05) are expected to be members of congress, so you would only reject the null hypothesis for less than 5% of them.
So if the test is correct, why does it seem so paradoxical? While it technically satisfies the criterion for a hypothesis test, for any fixed significance level we typically want to choose the rejection criterion which maximizes the power of the test - that is it maximizes the probability of rejecting the null hypothesis if it is false. In your case the test is terrible at this, it will never reject the null hypothesis even if it is false.
The paradox depends on the rejection criterion being impossible under the alternative hypothesis or more unlikely than under the alternate hypothesis that under the null. Any such test will have zero or very low power.
2
Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
â rinspy
4 hours ago
Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
â Rohan
3 hours ago
1
I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
â Rohan
3 hours ago
add a comment |Â
up vote
5
down vote
up vote
5
down vote
To make this test apply to the population we could change the hypotheses slightly to
H0: The sample is drawn from a population in the US
H1: The sample is drawn from a population not in the US
As far as I can tell there's nothing wrong with this hypothesis test. For a hypothesis test with significance level 0.05 (for example) is a test you need that if the null hypothesis is true then the probability that the test will reject it is less than 0.05.
In this example if you think about repeatedly sampling, if the null hypothesis then we will choose people from the US. And out of those people, only a very small fraction (less than 0.05) are expected to be members of congress, so you would only reject the null hypothesis for less than 5% of them.
So if the test is correct, why does it seem so paradoxical? While it technically satisfies the criterion for a hypothesis test, for any fixed significance level we typically want to choose the rejection criterion which maximizes the power of the test - that is it maximizes the probability of rejecting the null hypothesis if it is false. In your case the test is terrible at this, it will never reject the null hypothesis even if it is false.
The paradox depends on the rejection criterion being impossible under the alternative hypothesis or more unlikely than under the alternate hypothesis that under the null. Any such test will have zero or very low power.
To make this test apply to the population we could change the hypotheses slightly to
H0: The sample is drawn from a population in the US
H1: The sample is drawn from a population not in the US
As far as I can tell there's nothing wrong with this hypothesis test. For a hypothesis test with significance level 0.05 (for example) is a test you need that if the null hypothesis is true then the probability that the test will reject it is less than 0.05.
In this example if you think about repeatedly sampling, if the null hypothesis then we will choose people from the US. And out of those people, only a very small fraction (less than 0.05) are expected to be members of congress, so you would only reject the null hypothesis for less than 5% of them.
So if the test is correct, why does it seem so paradoxical? While it technically satisfies the criterion for a hypothesis test, for any fixed significance level we typically want to choose the rejection criterion which maximizes the power of the test - that is it maximizes the probability of rejecting the null hypothesis if it is false. In your case the test is terrible at this, it will never reject the null hypothesis even if it is false.
The paradox depends on the rejection criterion being impossible under the alternative hypothesis or more unlikely than under the alternate hypothesis that under the null. Any such test will have zero or very low power.
edited 3 hours ago
answered 5 hours ago
Rohan
813
813
2
Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
â rinspy
4 hours ago
Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
â Rohan
3 hours ago
1
I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
â Rohan
3 hours ago
add a comment |Â
2
Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
â rinspy
4 hours ago
Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
â Rohan
3 hours ago
1
I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
â Rohan
3 hours ago
2
2
Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
â rinspy
4 hours ago
Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
â rinspy
4 hours ago
Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
â Rohan
3 hours ago
Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
â Rohan
3 hours ago
1
1
I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
â Rohan
3 hours ago
I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
â Rohan
3 hours ago
add a comment |Â
up vote
2
down vote
I'd say there's at least two additional problems with your "paradox" (I don't even think it is a valid testing problem):
First, "Member of Congress=1" is an invalid test statistic for your "H0" as it does not measure deviation from the H0. So, a person who is not American would have automatically "Member of Congress=0" which also applies to most Americans. Let me expand on that. What values can the test statistic take? Well 1 if the person is a member of congress AND American and 0 if the person is either American AND not member of congress, OR Non-American. That means that the test statistic can take on TWO distinct values (0, 1) if the null were true! And both values do carry information in favour of the H0 (0 for Americans non-congress members, 1 for American congress members). But one of these values (0) also carries information in favour against the H0. So what does one learn about H0 in case the test statistic is 0 or not 0? Thus the test you describe appears invalid.
Second, the p-value is defined as the probability to observe a test statistic that speaks as much or more strongly against the H0 as the value you have observed in the sample. In other words, it is a quantile of the distribution of the test statistic under the assumption the null were true. I have difficulties to match your p-value, which seems to simply be a conditional probability, to that. But not every conditional probability conditioning on "American" is automatically the correct p-value because for that one would have to work out the correct null distribution of the test statistic that you propose.
add a comment |Â
up vote
2
down vote
I'd say there's at least two additional problems with your "paradox" (I don't even think it is a valid testing problem):
First, "Member of Congress=1" is an invalid test statistic for your "H0" as it does not measure deviation from the H0. So, a person who is not American would have automatically "Member of Congress=0" which also applies to most Americans. Let me expand on that. What values can the test statistic take? Well 1 if the person is a member of congress AND American and 0 if the person is either American AND not member of congress, OR Non-American. That means that the test statistic can take on TWO distinct values (0, 1) if the null were true! And both values do carry information in favour of the H0 (0 for Americans non-congress members, 1 for American congress members). But one of these values (0) also carries information in favour against the H0. So what does one learn about H0 in case the test statistic is 0 or not 0? Thus the test you describe appears invalid.
Second, the p-value is defined as the probability to observe a test statistic that speaks as much or more strongly against the H0 as the value you have observed in the sample. In other words, it is a quantile of the distribution of the test statistic under the assumption the null were true. I have difficulties to match your p-value, which seems to simply be a conditional probability, to that. But not every conditional probability conditioning on "American" is automatically the correct p-value because for that one would have to work out the correct null distribution of the test statistic that you propose.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
I'd say there's at least two additional problems with your "paradox" (I don't even think it is a valid testing problem):
First, "Member of Congress=1" is an invalid test statistic for your "H0" as it does not measure deviation from the H0. So, a person who is not American would have automatically "Member of Congress=0" which also applies to most Americans. Let me expand on that. What values can the test statistic take? Well 1 if the person is a member of congress AND American and 0 if the person is either American AND not member of congress, OR Non-American. That means that the test statistic can take on TWO distinct values (0, 1) if the null were true! And both values do carry information in favour of the H0 (0 for Americans non-congress members, 1 for American congress members). But one of these values (0) also carries information in favour against the H0. So what does one learn about H0 in case the test statistic is 0 or not 0? Thus the test you describe appears invalid.
Second, the p-value is defined as the probability to observe a test statistic that speaks as much or more strongly against the H0 as the value you have observed in the sample. In other words, it is a quantile of the distribution of the test statistic under the assumption the null were true. I have difficulties to match your p-value, which seems to simply be a conditional probability, to that. But not every conditional probability conditioning on "American" is automatically the correct p-value because for that one would have to work out the correct null distribution of the test statistic that you propose.
I'd say there's at least two additional problems with your "paradox" (I don't even think it is a valid testing problem):
First, "Member of Congress=1" is an invalid test statistic for your "H0" as it does not measure deviation from the H0. So, a person who is not American would have automatically "Member of Congress=0" which also applies to most Americans. Let me expand on that. What values can the test statistic take? Well 1 if the person is a member of congress AND American and 0 if the person is either American AND not member of congress, OR Non-American. That means that the test statistic can take on TWO distinct values (0, 1) if the null were true! And both values do carry information in favour of the H0 (0 for Americans non-congress members, 1 for American congress members). But one of these values (0) also carries information in favour against the H0. So what does one learn about H0 in case the test statistic is 0 or not 0? Thus the test you describe appears invalid.
Second, the p-value is defined as the probability to observe a test statistic that speaks as much or more strongly against the H0 as the value you have observed in the sample. In other words, it is a quantile of the distribution of the test statistic under the assumption the null were true. I have difficulties to match your p-value, which seems to simply be a conditional probability, to that. But not every conditional probability conditioning on "American" is automatically the correct p-value because for that one would have to work out the correct null distribution of the test statistic that you propose.
edited 49 mins ago
answered 2 hours ago
Momo
7,18423654
7,18423654
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f366708%2fhow-do-frequentists-address-this-paradox-of-hypothesis-testing%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
"The person is an American" is not a null hypothesis: it is a prediction about the value of a random variable. It cannot possibly have a p-value. You aren't doing hypothesis testing at all--and since you haven't explained how you obtained your purported "p-value," it isn't evident what you're doing or what your "methodology" might possibly be. Could you edit your post to explain it?
â whuberâ¦
59 mins ago