Do you need to calculate sample size to evaluate a new diagnostic test?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite












I am writing a grant application which will be evaluating a new diagnostic test. The test will predict whether a patient with lung fibrosis will remain stable or progress. I am using an existing cohort of patients with lung fibrosis to test this new diagnostic test. I was planning on reporting the results of the diagnostic test as AUC and sensitivity and specificity.



The cohort has 308 patients.
155 are known to have developed progressive disease
153 are known to have remained stable



Based on previous studies, I would expect the diagnostic test to achieve an AUC of at least 0.75 (80% power (α=0.05))



I have been told that I may be asked at interview about power calculations and sample sizes. I have a few questions:



Although this would be a post-hoc sample calculation, does it even make sense to do a power calculation. Would the question "is the population adequately powered to allow an AUC of 0.75?" be a valid question to ask?



I apologise for this very basic question, but I am having great difficulty getting a straight answer (perhaps because my question is incorrect?)










share|cite|improve this question























  • I have edited your choice of tags since diagnostic is about regression diagnostics not diagnostic tests in healthcare.
    – mdewey
    1 hour ago
















up vote
1
down vote

favorite












I am writing a grant application which will be evaluating a new diagnostic test. The test will predict whether a patient with lung fibrosis will remain stable or progress. I am using an existing cohort of patients with lung fibrosis to test this new diagnostic test. I was planning on reporting the results of the diagnostic test as AUC and sensitivity and specificity.



The cohort has 308 patients.
155 are known to have developed progressive disease
153 are known to have remained stable



Based on previous studies, I would expect the diagnostic test to achieve an AUC of at least 0.75 (80% power (α=0.05))



I have been told that I may be asked at interview about power calculations and sample sizes. I have a few questions:



Although this would be a post-hoc sample calculation, does it even make sense to do a power calculation. Would the question "is the population adequately powered to allow an AUC of 0.75?" be a valid question to ask?



I apologise for this very basic question, but I am having great difficulty getting a straight answer (perhaps because my question is incorrect?)










share|cite|improve this question























  • I have edited your choice of tags since diagnostic is about regression diagnostics not diagnostic tests in healthcare.
    – mdewey
    1 hour ago












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I am writing a grant application which will be evaluating a new diagnostic test. The test will predict whether a patient with lung fibrosis will remain stable or progress. I am using an existing cohort of patients with lung fibrosis to test this new diagnostic test. I was planning on reporting the results of the diagnostic test as AUC and sensitivity and specificity.



The cohort has 308 patients.
155 are known to have developed progressive disease
153 are known to have remained stable



Based on previous studies, I would expect the diagnostic test to achieve an AUC of at least 0.75 (80% power (α=0.05))



I have been told that I may be asked at interview about power calculations and sample sizes. I have a few questions:



Although this would be a post-hoc sample calculation, does it even make sense to do a power calculation. Would the question "is the population adequately powered to allow an AUC of 0.75?" be a valid question to ask?



I apologise for this very basic question, but I am having great difficulty getting a straight answer (perhaps because my question is incorrect?)










share|cite|improve this question















I am writing a grant application which will be evaluating a new diagnostic test. The test will predict whether a patient with lung fibrosis will remain stable or progress. I am using an existing cohort of patients with lung fibrosis to test this new diagnostic test. I was planning on reporting the results of the diagnostic test as AUC and sensitivity and specificity.



The cohort has 308 patients.
155 are known to have developed progressive disease
153 are known to have remained stable



Based on previous studies, I would expect the diagnostic test to achieve an AUC of at least 0.75 (80% power (α=0.05))



I have been told that I may be asked at interview about power calculations and sample sizes. I have a few questions:



Although this would be a post-hoc sample calculation, does it even make sense to do a power calculation. Would the question "is the population adequately powered to allow an AUC of 0.75?" be a valid question to ask?



I apologise for this very basic question, but I am having great difficulty getting a straight answer (perhaps because my question is incorrect?)







power-analysis auc sensitivity-specificity






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited 1 hour ago









mdewey

11k72040




11k72040










asked 2 hours ago









GhostRider

20218




20218











  • I have edited your choice of tags since diagnostic is about regression diagnostics not diagnostic tests in healthcare.
    – mdewey
    1 hour ago
















  • I have edited your choice of tags since diagnostic is about regression diagnostics not diagnostic tests in healthcare.
    – mdewey
    1 hour ago















I have edited your choice of tags since diagnostic is about regression diagnostics not diagnostic tests in healthcare.
– mdewey
1 hour ago




I have edited your choice of tags since diagnostic is about regression diagnostics not diagnostic tests in healthcare.
– mdewey
1 hour ago










2 Answers
2






active

oldest

votes

















up vote
2
down vote













It is realistic for them to ask you about whether your sample size is adequate and for what. Suppose we take the sensitivity. This is just a simple proportion so you could determine a range of sensitivities which you think are plausible, say 0.7, 0.75, 0.8, 0.85, 0.90, and then for your sample of known cases (155) calculate what the confidence interval would be about the estimated proportion. This would let them know how precise your estimate is going to be. You could do the same for specificity (of course that is just another, independent, proportion). You could also with rather more work do something similar for AUC.



The main advantage in doing this would be that it would show you are serious about what you are doing. You cannot change the size of the cohort nor alter disease prevalence so all they can do is decide they want to buy that level of confidence interval or not.






share|cite|improve this answer



























    up vote
    0
    down vote













    It's a post-hoc power calculation if you have already evaluated your diagnostic test on this cohort of 308 patients. In that case, there is no reason to perform a power calculation, even though some may nevertheless demand such useless exercises; see this page for some discussion. If the evaluation is already done, you got whatever results you got about the value of the diagnostic test.



    If evaluating your diagnostic test on this cohort of 308 patients has not yet been done and is the aim of your grant application, then I think that the reviewers have an obligation to ask whether your sample size is adequate to document the value of your test, as @mdewey notes in another answer. There is at least one page on this site describing power calculations for AUC/ROC, and at least one R package to help with the calculations.



    If you are limited to these 308 patients, then report what AUC you could detect as significant with that sample size (for example, with 80% power at α=0.05, 2-sided test) based on your experience with or expectations about the reliability of the test, and discuss whether that AUC could form the basis of a clinically useful test. The sensitivity and specificity of course depend on your particular choice of cutoff for whatever continuous measure varies along your ROC curve; you should be prepared to discuss the clinical significance of the underlying tradeoffs of positive and negative misclassifications as you vary that cutoff.



    For ultimately evaluating the model you should use a proper scoring rule rather than sensitivity or accuracy, as discussed on this page. AUC is certainly better than sensitivity or accuracy, but even it has limitations. For purposes of demonstrating expected statistical power, however, the AUC is easy to explain and is a reasonable choice






    share|cite|improve this answer




















    • Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
      – GhostRider
      17 mins ago










    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "65"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f369341%2fdo-you-need-to-calculate-sample-size-to-evaluate-a-new-diagnostic-test%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote













    It is realistic for them to ask you about whether your sample size is adequate and for what. Suppose we take the sensitivity. This is just a simple proportion so you could determine a range of sensitivities which you think are plausible, say 0.7, 0.75, 0.8, 0.85, 0.90, and then for your sample of known cases (155) calculate what the confidence interval would be about the estimated proportion. This would let them know how precise your estimate is going to be. You could do the same for specificity (of course that is just another, independent, proportion). You could also with rather more work do something similar for AUC.



    The main advantage in doing this would be that it would show you are serious about what you are doing. You cannot change the size of the cohort nor alter disease prevalence so all they can do is decide they want to buy that level of confidence interval or not.






    share|cite|improve this answer
























      up vote
      2
      down vote













      It is realistic for them to ask you about whether your sample size is adequate and for what. Suppose we take the sensitivity. This is just a simple proportion so you could determine a range of sensitivities which you think are plausible, say 0.7, 0.75, 0.8, 0.85, 0.90, and then for your sample of known cases (155) calculate what the confidence interval would be about the estimated proportion. This would let them know how precise your estimate is going to be. You could do the same for specificity (of course that is just another, independent, proportion). You could also with rather more work do something similar for AUC.



      The main advantage in doing this would be that it would show you are serious about what you are doing. You cannot change the size of the cohort nor alter disease prevalence so all they can do is decide they want to buy that level of confidence interval or not.






      share|cite|improve this answer






















        up vote
        2
        down vote










        up vote
        2
        down vote









        It is realistic for them to ask you about whether your sample size is adequate and for what. Suppose we take the sensitivity. This is just a simple proportion so you could determine a range of sensitivities which you think are plausible, say 0.7, 0.75, 0.8, 0.85, 0.90, and then for your sample of known cases (155) calculate what the confidence interval would be about the estimated proportion. This would let them know how precise your estimate is going to be. You could do the same for specificity (of course that is just another, independent, proportion). You could also with rather more work do something similar for AUC.



        The main advantage in doing this would be that it would show you are serious about what you are doing. You cannot change the size of the cohort nor alter disease prevalence so all they can do is decide they want to buy that level of confidence interval or not.






        share|cite|improve this answer












        It is realistic for them to ask you about whether your sample size is adequate and for what. Suppose we take the sensitivity. This is just a simple proportion so you could determine a range of sensitivities which you think are plausible, say 0.7, 0.75, 0.8, 0.85, 0.90, and then for your sample of known cases (155) calculate what the confidence interval would be about the estimated proportion. This would let them know how precise your estimate is going to be. You could do the same for specificity (of course that is just another, independent, proportion). You could also with rather more work do something similar for AUC.



        The main advantage in doing this would be that it would show you are serious about what you are doing. You cannot change the size of the cohort nor alter disease prevalence so all they can do is decide they want to buy that level of confidence interval or not.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered 1 hour ago









        mdewey

        11k72040




        11k72040






















            up vote
            0
            down vote













            It's a post-hoc power calculation if you have already evaluated your diagnostic test on this cohort of 308 patients. In that case, there is no reason to perform a power calculation, even though some may nevertheless demand such useless exercises; see this page for some discussion. If the evaluation is already done, you got whatever results you got about the value of the diagnostic test.



            If evaluating your diagnostic test on this cohort of 308 patients has not yet been done and is the aim of your grant application, then I think that the reviewers have an obligation to ask whether your sample size is adequate to document the value of your test, as @mdewey notes in another answer. There is at least one page on this site describing power calculations for AUC/ROC, and at least one R package to help with the calculations.



            If you are limited to these 308 patients, then report what AUC you could detect as significant with that sample size (for example, with 80% power at α=0.05, 2-sided test) based on your experience with or expectations about the reliability of the test, and discuss whether that AUC could form the basis of a clinically useful test. The sensitivity and specificity of course depend on your particular choice of cutoff for whatever continuous measure varies along your ROC curve; you should be prepared to discuss the clinical significance of the underlying tradeoffs of positive and negative misclassifications as you vary that cutoff.



            For ultimately evaluating the model you should use a proper scoring rule rather than sensitivity or accuracy, as discussed on this page. AUC is certainly better than sensitivity or accuracy, but even it has limitations. For purposes of demonstrating expected statistical power, however, the AUC is easy to explain and is a reasonable choice






            share|cite|improve this answer




















            • Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
              – GhostRider
              17 mins ago














            up vote
            0
            down vote













            It's a post-hoc power calculation if you have already evaluated your diagnostic test on this cohort of 308 patients. In that case, there is no reason to perform a power calculation, even though some may nevertheless demand such useless exercises; see this page for some discussion. If the evaluation is already done, you got whatever results you got about the value of the diagnostic test.



            If evaluating your diagnostic test on this cohort of 308 patients has not yet been done and is the aim of your grant application, then I think that the reviewers have an obligation to ask whether your sample size is adequate to document the value of your test, as @mdewey notes in another answer. There is at least one page on this site describing power calculations for AUC/ROC, and at least one R package to help with the calculations.



            If you are limited to these 308 patients, then report what AUC you could detect as significant with that sample size (for example, with 80% power at α=0.05, 2-sided test) based on your experience with or expectations about the reliability of the test, and discuss whether that AUC could form the basis of a clinically useful test. The sensitivity and specificity of course depend on your particular choice of cutoff for whatever continuous measure varies along your ROC curve; you should be prepared to discuss the clinical significance of the underlying tradeoffs of positive and negative misclassifications as you vary that cutoff.



            For ultimately evaluating the model you should use a proper scoring rule rather than sensitivity or accuracy, as discussed on this page. AUC is certainly better than sensitivity or accuracy, but even it has limitations. For purposes of demonstrating expected statistical power, however, the AUC is easy to explain and is a reasonable choice






            share|cite|improve this answer




















            • Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
              – GhostRider
              17 mins ago












            up vote
            0
            down vote










            up vote
            0
            down vote









            It's a post-hoc power calculation if you have already evaluated your diagnostic test on this cohort of 308 patients. In that case, there is no reason to perform a power calculation, even though some may nevertheless demand such useless exercises; see this page for some discussion. If the evaluation is already done, you got whatever results you got about the value of the diagnostic test.



            If evaluating your diagnostic test on this cohort of 308 patients has not yet been done and is the aim of your grant application, then I think that the reviewers have an obligation to ask whether your sample size is adequate to document the value of your test, as @mdewey notes in another answer. There is at least one page on this site describing power calculations for AUC/ROC, and at least one R package to help with the calculations.



            If you are limited to these 308 patients, then report what AUC you could detect as significant with that sample size (for example, with 80% power at α=0.05, 2-sided test) based on your experience with or expectations about the reliability of the test, and discuss whether that AUC could form the basis of a clinically useful test. The sensitivity and specificity of course depend on your particular choice of cutoff for whatever continuous measure varies along your ROC curve; you should be prepared to discuss the clinical significance of the underlying tradeoffs of positive and negative misclassifications as you vary that cutoff.



            For ultimately evaluating the model you should use a proper scoring rule rather than sensitivity or accuracy, as discussed on this page. AUC is certainly better than sensitivity or accuracy, but even it has limitations. For purposes of demonstrating expected statistical power, however, the AUC is easy to explain and is a reasonable choice






            share|cite|improve this answer












            It's a post-hoc power calculation if you have already evaluated your diagnostic test on this cohort of 308 patients. In that case, there is no reason to perform a power calculation, even though some may nevertheless demand such useless exercises; see this page for some discussion. If the evaluation is already done, you got whatever results you got about the value of the diagnostic test.



            If evaluating your diagnostic test on this cohort of 308 patients has not yet been done and is the aim of your grant application, then I think that the reviewers have an obligation to ask whether your sample size is adequate to document the value of your test, as @mdewey notes in another answer. There is at least one page on this site describing power calculations for AUC/ROC, and at least one R package to help with the calculations.



            If you are limited to these 308 patients, then report what AUC you could detect as significant with that sample size (for example, with 80% power at α=0.05, 2-sided test) based on your experience with or expectations about the reliability of the test, and discuss whether that AUC could form the basis of a clinically useful test. The sensitivity and specificity of course depend on your particular choice of cutoff for whatever continuous measure varies along your ROC curve; you should be prepared to discuss the clinical significance of the underlying tradeoffs of positive and negative misclassifications as you vary that cutoff.



            For ultimately evaluating the model you should use a proper scoring rule rather than sensitivity or accuracy, as discussed on this page. AUC is certainly better than sensitivity or accuracy, but even it has limitations. For purposes of demonstrating expected statistical power, however, the AUC is easy to explain and is a reasonable choice







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered 24 mins ago









            EdM

            20k23389




            20k23389











            • Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
              – GhostRider
              17 mins ago
















            • Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
              – GhostRider
              17 mins ago















            Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
            – GhostRider
            17 mins ago




            Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
            – GhostRider
            17 mins ago

















             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f369341%2fdo-you-need-to-calculate-sample-size-to-evaluate-a-new-diagnostic-test%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            Long meetings (6-7 hours a day): Being “babysat” by supervisor

            Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

            Confectionery