How do frequentists address this paradox of hypothesis testing?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite
2












Suppose we sample a person from the population. They are a member of US Congress. We define the null hypothesis $H_0$ as "the person is American". We calculate the $p$-value: $P[member of Congress | American] ll 0.05$. Since if the null hypothesis holds, the person is very unlikely to be a member of Congress, we reject the null hypothesis and decide that the person is very likely not an American. This conclusion is obviously very wrong, as all members of Congress are American.



Which assumptions of the hypothesis testing did I violate here? In other words, if I encounter a similar (but more obscure) application where this methodology is also not appropriate, how do I identify it?










share|cite|improve this question

















  • 2




    "The person is an American" is not a null hypothesis: it is a prediction about the value of a random variable. It cannot possibly have a p-value. You aren't doing hypothesis testing at all--and since you haven't explained how you obtained your purported "p-value," it isn't evident what you're doing or what your "methodology" might possibly be. Could you edit your post to explain it?
    – whuber♦
    59 mins ago
















up vote
2
down vote

favorite
2












Suppose we sample a person from the population. They are a member of US Congress. We define the null hypothesis $H_0$ as "the person is American". We calculate the $p$-value: $P[member of Congress | American] ll 0.05$. Since if the null hypothesis holds, the person is very unlikely to be a member of Congress, we reject the null hypothesis and decide that the person is very likely not an American. This conclusion is obviously very wrong, as all members of Congress are American.



Which assumptions of the hypothesis testing did I violate here? In other words, if I encounter a similar (but more obscure) application where this methodology is also not appropriate, how do I identify it?










share|cite|improve this question

















  • 2




    "The person is an American" is not a null hypothesis: it is a prediction about the value of a random variable. It cannot possibly have a p-value. You aren't doing hypothesis testing at all--and since you haven't explained how you obtained your purported "p-value," it isn't evident what you're doing or what your "methodology" might possibly be. Could you edit your post to explain it?
    – whuber♦
    59 mins ago












up vote
2
down vote

favorite
2









up vote
2
down vote

favorite
2






2





Suppose we sample a person from the population. They are a member of US Congress. We define the null hypothesis $H_0$ as "the person is American". We calculate the $p$-value: $P[member of Congress | American] ll 0.05$. Since if the null hypothesis holds, the person is very unlikely to be a member of Congress, we reject the null hypothesis and decide that the person is very likely not an American. This conclusion is obviously very wrong, as all members of Congress are American.



Which assumptions of the hypothesis testing did I violate here? In other words, if I encounter a similar (but more obscure) application where this methodology is also not appropriate, how do I identify it?










share|cite|improve this question













Suppose we sample a person from the population. They are a member of US Congress. We define the null hypothesis $H_0$ as "the person is American". We calculate the $p$-value: $P[member of Congress | American] ll 0.05$. Since if the null hypothesis holds, the person is very unlikely to be a member of Congress, we reject the null hypothesis and decide that the person is very likely not an American. This conclusion is obviously very wrong, as all members of Congress are American.



Which assumptions of the hypothesis testing did I violate here? In other words, if I encounter a similar (but more obscure) application where this methodology is also not appropriate, how do I identify it?







hypothesis-testing bayesian frequentist






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked 9 hours ago









rinspy

1,863330




1,863330







  • 2




    "The person is an American" is not a null hypothesis: it is a prediction about the value of a random variable. It cannot possibly have a p-value. You aren't doing hypothesis testing at all--and since you haven't explained how you obtained your purported "p-value," it isn't evident what you're doing or what your "methodology" might possibly be. Could you edit your post to explain it?
    – whuber♦
    59 mins ago












  • 2




    "The person is an American" is not a null hypothesis: it is a prediction about the value of a random variable. It cannot possibly have a p-value. You aren't doing hypothesis testing at all--and since you haven't explained how you obtained your purported "p-value," it isn't evident what you're doing or what your "methodology" might possibly be. Could you edit your post to explain it?
    – whuber♦
    59 mins ago







2




2




"The person is an American" is not a null hypothesis: it is a prediction about the value of a random variable. It cannot possibly have a p-value. You aren't doing hypothesis testing at all--and since you haven't explained how you obtained your purported "p-value," it isn't evident what you're doing or what your "methodology" might possibly be. Could you edit your post to explain it?
– whuber♦
59 mins ago




"The person is an American" is not a null hypothesis: it is a prediction about the value of a random variable. It cannot possibly have a p-value. You aren't doing hypothesis testing at all--and since you haven't explained how you obtained your purported "p-value," it isn't evident what you're doing or what your "methodology" might possibly be. Could you edit your post to explain it?
– whuber♦
59 mins ago










3 Answers
3






active

oldest

votes

















up vote
5
down vote













Frequentist statistics is meant to make inference on populations using samples, not on individuals. You first define a population (which you have not done), take a sample, and make inference on the population using the sample, taking into account the uncertainty.



You have used your sampled individual as if it were your population, and try to make inference on him. But frequentist statistics do not apply here, you cannot repeat the sampling processs with a population size of 1.






share|cite|improve this answer
















  • 1




    But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
    – rinspy
    4 hours ago






  • 1




    In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
    – rinspy
    4 hours ago






  • 1




    The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
    – Knarpie
    4 hours ago







  • 1




    The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
    – rinspy
    4 hours ago










  • @Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
    – Rohan
    3 hours ago

















up vote
5
down vote













To make this test apply to the population we could change the hypotheses slightly to



H0: The sample is drawn from a population in the US



H1: The sample is drawn from a population not in the US



As far as I can tell there's nothing wrong with this hypothesis test. For a hypothesis test with significance level 0.05 (for example) is a test you need that if the null hypothesis is true then the probability that the test will reject it is less than 0.05.



In this example if you think about repeatedly sampling, if the null hypothesis then we will choose people from the US. And out of those people, only a very small fraction (less than 0.05) are expected to be members of congress, so you would only reject the null hypothesis for less than 5% of them.



So if the test is correct, why does it seem so paradoxical? While it technically satisfies the criterion for a hypothesis test, for any fixed significance level we typically want to choose the rejection criterion which maximizes the power of the test - that is it maximizes the probability of rejecting the null hypothesis if it is false. In your case the test is terrible at this, it will never reject the null hypothesis even if it is false.



The paradox depends on the rejection criterion being impossible under the alternative hypothesis or more unlikely than under the alternate hypothesis that under the null. Any such test will have zero or very low power.






share|cite|improve this answer


















  • 2




    Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
    – rinspy
    4 hours ago










  • Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
    – Rohan
    3 hours ago







  • 1




    I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
    – Rohan
    3 hours ago

















up vote
2
down vote













I'd say there's at least two additional problems with your "paradox" (I don't even think it is a valid testing problem):



First, "Member of Congress=1" is an invalid test statistic for your "H0" as it does not measure deviation from the H0. So, a person who is not American would have automatically "Member of Congress=0" which also applies to most Americans. Let me expand on that. What values can the test statistic take? Well 1 if the person is a member of congress AND American and 0 if the person is either American AND not member of congress, OR Non-American. That means that the test statistic can take on TWO distinct values (0, 1) if the null were true! And both values do carry information in favour of the H0 (0 for Americans non-congress members, 1 for American congress members). But one of these values (0) also carries information in favour against the H0. So what does one learn about H0 in case the test statistic is 0 or not 0? Thus the test you describe appears invalid.



Second, the p-value is defined as the probability to observe a test statistic that speaks as much or more strongly against the H0 as the value you have observed in the sample. In other words, it is a quantile of the distribution of the test statistic under the assumption the null were true. I have difficulties to match your p-value, which seems to simply be a conditional probability, to that. But not every conditional probability conditioning on "American" is automatically the correct p-value because for that one would have to work out the correct null distribution of the test statistic that you propose.






share|cite|improve this answer






















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "65"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f366708%2fhow-do-frequentists-address-this-paradox-of-hypothesis-testing%23new-answer', 'question_page');

    );

    Post as a guest






























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    5
    down vote













    Frequentist statistics is meant to make inference on populations using samples, not on individuals. You first define a population (which you have not done), take a sample, and make inference on the population using the sample, taking into account the uncertainty.



    You have used your sampled individual as if it were your population, and try to make inference on him. But frequentist statistics do not apply here, you cannot repeat the sampling processs with a population size of 1.






    share|cite|improve this answer
















    • 1




      But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
      – rinspy
      4 hours ago






    • 1




      In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
      – rinspy
      4 hours ago






    • 1




      The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
      – Knarpie
      4 hours ago







    • 1




      The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
      – rinspy
      4 hours ago










    • @Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
      – Rohan
      3 hours ago














    up vote
    5
    down vote













    Frequentist statistics is meant to make inference on populations using samples, not on individuals. You first define a population (which you have not done), take a sample, and make inference on the population using the sample, taking into account the uncertainty.



    You have used your sampled individual as if it were your population, and try to make inference on him. But frequentist statistics do not apply here, you cannot repeat the sampling processs with a population size of 1.






    share|cite|improve this answer
















    • 1




      But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
      – rinspy
      4 hours ago






    • 1




      In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
      – rinspy
      4 hours ago






    • 1




      The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
      – Knarpie
      4 hours ago







    • 1




      The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
      – rinspy
      4 hours ago










    • @Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
      – Rohan
      3 hours ago












    up vote
    5
    down vote










    up vote
    5
    down vote









    Frequentist statistics is meant to make inference on populations using samples, not on individuals. You first define a population (which you have not done), take a sample, and make inference on the population using the sample, taking into account the uncertainty.



    You have used your sampled individual as if it were your population, and try to make inference on him. But frequentist statistics do not apply here, you cannot repeat the sampling processs with a population size of 1.






    share|cite|improve this answer












    Frequentist statistics is meant to make inference on populations using samples, not on individuals. You first define a population (which you have not done), take a sample, and make inference on the population using the sample, taking into account the uncertainty.



    You have used your sampled individual as if it were your population, and try to make inference on him. But frequentist statistics do not apply here, you cannot repeat the sampling processs with a population size of 1.







    share|cite|improve this answer












    share|cite|improve this answer



    share|cite|improve this answer










    answered 6 hours ago









    Knarpie

    1,196418




    1,196418







    • 1




      But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
      – rinspy
      4 hours ago






    • 1




      In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
      – rinspy
      4 hours ago






    • 1




      The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
      – Knarpie
      4 hours ago







    • 1




      The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
      – rinspy
      4 hours ago










    • @Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
      – Rohan
      3 hours ago












    • 1




      But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
      – rinspy
      4 hours ago






    • 1




      In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
      – rinspy
      4 hours ago






    • 1




      The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
      – Knarpie
      4 hours ago







    • 1




      The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
      – rinspy
      4 hours ago










    • @Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
      – Rohan
      3 hours ago







    1




    1




    But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
    – rinspy
    4 hours ago




    But is my example not fundamentally the same in this sense as Fisher's lady drinking tea? He had a particular lady, and he wanted to find out if she has "the ability" or not based on how many cups of tea she guessed. So the way I see it, his argument was "If I were to sample from the population of 'no ability' outcomes, a sample with all cups guessed correctly would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'no ability' population".
    – rinspy
    4 hours ago




    1




    1




    In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
    – rinspy
    4 hours ago




    In my case, the argument is "If I were to sample from the population of Americans, a sample that is a member of Congress would be very unlikely. Therefore, it is unlikely that the sample I am looking at came from the 'Americans' distribution".
    – rinspy
    4 hours ago




    1




    1




    The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
    – Knarpie
    4 hours ago





    The population in the case of the lady drinking tea is the cups of tea she could possibly drink (which is infinite), not the lady herself. If she does not have the ability, the probability that she guesses right is 0.5. The null hypothesis tested here revolves around this parameter, in your earlier example it revolved around one individual (one teacup by analogy).
    – Knarpie
    4 hours ago





    1




    1




    The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
    – rinspy
    4 hours ago




    The only difference I can see is that in Fisher's example, we implicitly assume that the rare sample is much more likely to have been sampled from some population that is not the null hypothesis population (it doesn't matter which specific population). In my case this assumption does not hold - there is no population that is not the null hypothesis population in which the sample "Member of Congress" is likely.
    – rinspy
    4 hours ago












    @Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
    – Rohan
    3 hours ago




    @Knarpie I agree with this answer and it is something I overlooked in my answer (and was therefore sloppy about certain things in mine). However, maybe we could consider H0: The population we are sampling from is from the US and H1: The population we are sampling from is not from the US. With the same rejection criterion "Reject if the selected person is a member of congress" I think we still get the same paradox.
    – Rohan
    3 hours ago












    up vote
    5
    down vote













    To make this test apply to the population we could change the hypotheses slightly to



    H0: The sample is drawn from a population in the US



    H1: The sample is drawn from a population not in the US



    As far as I can tell there's nothing wrong with this hypothesis test. For a hypothesis test with significance level 0.05 (for example) is a test you need that if the null hypothesis is true then the probability that the test will reject it is less than 0.05.



    In this example if you think about repeatedly sampling, if the null hypothesis then we will choose people from the US. And out of those people, only a very small fraction (less than 0.05) are expected to be members of congress, so you would only reject the null hypothesis for less than 5% of them.



    So if the test is correct, why does it seem so paradoxical? While it technically satisfies the criterion for a hypothesis test, for any fixed significance level we typically want to choose the rejection criterion which maximizes the power of the test - that is it maximizes the probability of rejecting the null hypothesis if it is false. In your case the test is terrible at this, it will never reject the null hypothesis even if it is false.



    The paradox depends on the rejection criterion being impossible under the alternative hypothesis or more unlikely than under the alternate hypothesis that under the null. Any such test will have zero or very low power.






    share|cite|improve this answer


















    • 2




      Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
      – rinspy
      4 hours ago










    • Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
      – Rohan
      3 hours ago







    • 1




      I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
      – Rohan
      3 hours ago














    up vote
    5
    down vote













    To make this test apply to the population we could change the hypotheses slightly to



    H0: The sample is drawn from a population in the US



    H1: The sample is drawn from a population not in the US



    As far as I can tell there's nothing wrong with this hypothesis test. For a hypothesis test with significance level 0.05 (for example) is a test you need that if the null hypothesis is true then the probability that the test will reject it is less than 0.05.



    In this example if you think about repeatedly sampling, if the null hypothesis then we will choose people from the US. And out of those people, only a very small fraction (less than 0.05) are expected to be members of congress, so you would only reject the null hypothesis for less than 5% of them.



    So if the test is correct, why does it seem so paradoxical? While it technically satisfies the criterion for a hypothesis test, for any fixed significance level we typically want to choose the rejection criterion which maximizes the power of the test - that is it maximizes the probability of rejecting the null hypothesis if it is false. In your case the test is terrible at this, it will never reject the null hypothesis even if it is false.



    The paradox depends on the rejection criterion being impossible under the alternative hypothesis or more unlikely than under the alternate hypothesis that under the null. Any such test will have zero or very low power.






    share|cite|improve this answer


















    • 2




      Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
      – rinspy
      4 hours ago










    • Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
      – Rohan
      3 hours ago







    • 1




      I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
      – Rohan
      3 hours ago












    up vote
    5
    down vote










    up vote
    5
    down vote









    To make this test apply to the population we could change the hypotheses slightly to



    H0: The sample is drawn from a population in the US



    H1: The sample is drawn from a population not in the US



    As far as I can tell there's nothing wrong with this hypothesis test. For a hypothesis test with significance level 0.05 (for example) is a test you need that if the null hypothesis is true then the probability that the test will reject it is less than 0.05.



    In this example if you think about repeatedly sampling, if the null hypothesis then we will choose people from the US. And out of those people, only a very small fraction (less than 0.05) are expected to be members of congress, so you would only reject the null hypothesis for less than 5% of them.



    So if the test is correct, why does it seem so paradoxical? While it technically satisfies the criterion for a hypothesis test, for any fixed significance level we typically want to choose the rejection criterion which maximizes the power of the test - that is it maximizes the probability of rejecting the null hypothesis if it is false. In your case the test is terrible at this, it will never reject the null hypothesis even if it is false.



    The paradox depends on the rejection criterion being impossible under the alternative hypothesis or more unlikely than under the alternate hypothesis that under the null. Any such test will have zero or very low power.






    share|cite|improve this answer














    To make this test apply to the population we could change the hypotheses slightly to



    H0: The sample is drawn from a population in the US



    H1: The sample is drawn from a population not in the US



    As far as I can tell there's nothing wrong with this hypothesis test. For a hypothesis test with significance level 0.05 (for example) is a test you need that if the null hypothesis is true then the probability that the test will reject it is less than 0.05.



    In this example if you think about repeatedly sampling, if the null hypothesis then we will choose people from the US. And out of those people, only a very small fraction (less than 0.05) are expected to be members of congress, so you would only reject the null hypothesis for less than 5% of them.



    So if the test is correct, why does it seem so paradoxical? While it technically satisfies the criterion for a hypothesis test, for any fixed significance level we typically want to choose the rejection criterion which maximizes the power of the test - that is it maximizes the probability of rejecting the null hypothesis if it is false. In your case the test is terrible at this, it will never reject the null hypothesis even if it is false.



    The paradox depends on the rejection criterion being impossible under the alternative hypothesis or more unlikely than under the alternate hypothesis that under the null. Any such test will have zero or very low power.







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited 3 hours ago

























    answered 5 hours ago









    Rohan

    813




    813







    • 2




      Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
      – rinspy
      4 hours ago










    • Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
      – Rohan
      3 hours ago







    • 1




      I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
      – Rohan
      3 hours ago












    • 2




      Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
      – rinspy
      4 hours ago










    • Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
      – Rohan
      3 hours ago







    • 1




      I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
      – Rohan
      3 hours ago







    2




    2




    Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
    – rinspy
    4 hours ago




    Intuitively, the assumption that "the rejection criterion must be (much) more likely under the alternate hypothesis than under the null" seems key. In fact, the more I think about it, the more it seems like an implicit assumption in this kind of statistical testing.
    – rinspy
    4 hours ago












    Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
    – Rohan
    3 hours ago





    Agreed. I think it's so implicit because it's usually not a problem, the null hypothesis is typically quite specific and the alternative quite broad (e.g. a specific value of a parameter vs all others or independence vs dependence), but it's still probably not mentioned enough.
    – Rohan
    3 hours ago





    1




    1




    I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
    – Rohan
    3 hours ago




    I've altered my answer slightly to take into account the other answer which I think raises a valid point. But the idea hasn't changed.
    – Rohan
    3 hours ago










    up vote
    2
    down vote













    I'd say there's at least two additional problems with your "paradox" (I don't even think it is a valid testing problem):



    First, "Member of Congress=1" is an invalid test statistic for your "H0" as it does not measure deviation from the H0. So, a person who is not American would have automatically "Member of Congress=0" which also applies to most Americans. Let me expand on that. What values can the test statistic take? Well 1 if the person is a member of congress AND American and 0 if the person is either American AND not member of congress, OR Non-American. That means that the test statistic can take on TWO distinct values (0, 1) if the null were true! And both values do carry information in favour of the H0 (0 for Americans non-congress members, 1 for American congress members). But one of these values (0) also carries information in favour against the H0. So what does one learn about H0 in case the test statistic is 0 or not 0? Thus the test you describe appears invalid.



    Second, the p-value is defined as the probability to observe a test statistic that speaks as much or more strongly against the H0 as the value you have observed in the sample. In other words, it is a quantile of the distribution of the test statistic under the assumption the null were true. I have difficulties to match your p-value, which seems to simply be a conditional probability, to that. But not every conditional probability conditioning on "American" is automatically the correct p-value because for that one would have to work out the correct null distribution of the test statistic that you propose.






    share|cite|improve this answer


























      up vote
      2
      down vote













      I'd say there's at least two additional problems with your "paradox" (I don't even think it is a valid testing problem):



      First, "Member of Congress=1" is an invalid test statistic for your "H0" as it does not measure deviation from the H0. So, a person who is not American would have automatically "Member of Congress=0" which also applies to most Americans. Let me expand on that. What values can the test statistic take? Well 1 if the person is a member of congress AND American and 0 if the person is either American AND not member of congress, OR Non-American. That means that the test statistic can take on TWO distinct values (0, 1) if the null were true! And both values do carry information in favour of the H0 (0 for Americans non-congress members, 1 for American congress members). But one of these values (0) also carries information in favour against the H0. So what does one learn about H0 in case the test statistic is 0 or not 0? Thus the test you describe appears invalid.



      Second, the p-value is defined as the probability to observe a test statistic that speaks as much or more strongly against the H0 as the value you have observed in the sample. In other words, it is a quantile of the distribution of the test statistic under the assumption the null were true. I have difficulties to match your p-value, which seems to simply be a conditional probability, to that. But not every conditional probability conditioning on "American" is automatically the correct p-value because for that one would have to work out the correct null distribution of the test statistic that you propose.






      share|cite|improve this answer
























        up vote
        2
        down vote










        up vote
        2
        down vote









        I'd say there's at least two additional problems with your "paradox" (I don't even think it is a valid testing problem):



        First, "Member of Congress=1" is an invalid test statistic for your "H0" as it does not measure deviation from the H0. So, a person who is not American would have automatically "Member of Congress=0" which also applies to most Americans. Let me expand on that. What values can the test statistic take? Well 1 if the person is a member of congress AND American and 0 if the person is either American AND not member of congress, OR Non-American. That means that the test statistic can take on TWO distinct values (0, 1) if the null were true! And both values do carry information in favour of the H0 (0 for Americans non-congress members, 1 for American congress members). But one of these values (0) also carries information in favour against the H0. So what does one learn about H0 in case the test statistic is 0 or not 0? Thus the test you describe appears invalid.



        Second, the p-value is defined as the probability to observe a test statistic that speaks as much or more strongly against the H0 as the value you have observed in the sample. In other words, it is a quantile of the distribution of the test statistic under the assumption the null were true. I have difficulties to match your p-value, which seems to simply be a conditional probability, to that. But not every conditional probability conditioning on "American" is automatically the correct p-value because for that one would have to work out the correct null distribution of the test statistic that you propose.






        share|cite|improve this answer














        I'd say there's at least two additional problems with your "paradox" (I don't even think it is a valid testing problem):



        First, "Member of Congress=1" is an invalid test statistic for your "H0" as it does not measure deviation from the H0. So, a person who is not American would have automatically "Member of Congress=0" which also applies to most Americans. Let me expand on that. What values can the test statistic take? Well 1 if the person is a member of congress AND American and 0 if the person is either American AND not member of congress, OR Non-American. That means that the test statistic can take on TWO distinct values (0, 1) if the null were true! And both values do carry information in favour of the H0 (0 for Americans non-congress members, 1 for American congress members). But one of these values (0) also carries information in favour against the H0. So what does one learn about H0 in case the test statistic is 0 or not 0? Thus the test you describe appears invalid.



        Second, the p-value is defined as the probability to observe a test statistic that speaks as much or more strongly against the H0 as the value you have observed in the sample. In other words, it is a quantile of the distribution of the test statistic under the assumption the null were true. I have difficulties to match your p-value, which seems to simply be a conditional probability, to that. But not every conditional probability conditioning on "American" is automatically the correct p-value because for that one would have to work out the correct null distribution of the test statistic that you propose.







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited 49 mins ago

























        answered 2 hours ago









        Momo

        7,18423654




        7,18423654



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f366708%2fhow-do-frequentists-address-this-paradox-of-hypothesis-testing%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            Long meetings (6-7 hours a day): Being “babysat” by supervisor

            Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

            Confectionery