Do you need to calculate sample size to evaluate a new diagnostic test?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
I am writing a grant application which will be evaluating a new diagnostic test. The test will predict whether a patient with lung fibrosis will remain stable or progress. I am using an existing cohort of patients with lung fibrosis to test this new diagnostic test. I was planning on reporting the results of the diagnostic test as AUC and sensitivity and specificity.
The cohort has 308 patients.
155 are known to have developed progressive disease
153 are known to have remained stable
Based on previous studies, I would expect the diagnostic test to achieve an AUC of at least 0.75 (80% power (ñ=0.05))
I have been told that I may be asked at interview about power calculations and sample sizes. I have a few questions:
Although this would be a post-hoc sample calculation, does it even make sense to do a power calculation. Would the question "is the population adequately powered to allow an AUC of 0.75?" be a valid question to ask?
I apologise for this very basic question, but I am having great difficulty getting a straight answer (perhaps because my question is incorrect?)
power-analysis auc sensitivity-specificity
add a comment |Â
up vote
1
down vote
favorite
I am writing a grant application which will be evaluating a new diagnostic test. The test will predict whether a patient with lung fibrosis will remain stable or progress. I am using an existing cohort of patients with lung fibrosis to test this new diagnostic test. I was planning on reporting the results of the diagnostic test as AUC and sensitivity and specificity.
The cohort has 308 patients.
155 are known to have developed progressive disease
153 are known to have remained stable
Based on previous studies, I would expect the diagnostic test to achieve an AUC of at least 0.75 (80% power (ñ=0.05))
I have been told that I may be asked at interview about power calculations and sample sizes. I have a few questions:
Although this would be a post-hoc sample calculation, does it even make sense to do a power calculation. Would the question "is the population adequately powered to allow an AUC of 0.75?" be a valid question to ask?
I apologise for this very basic question, but I am having great difficulty getting a straight answer (perhaps because my question is incorrect?)
power-analysis auc sensitivity-specificity
I have edited your choice of tags since diagnostic is about regression diagnostics not diagnostic tests in healthcare.
â mdewey
1 hour ago
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I am writing a grant application which will be evaluating a new diagnostic test. The test will predict whether a patient with lung fibrosis will remain stable or progress. I am using an existing cohort of patients with lung fibrosis to test this new diagnostic test. I was planning on reporting the results of the diagnostic test as AUC and sensitivity and specificity.
The cohort has 308 patients.
155 are known to have developed progressive disease
153 are known to have remained stable
Based on previous studies, I would expect the diagnostic test to achieve an AUC of at least 0.75 (80% power (ñ=0.05))
I have been told that I may be asked at interview about power calculations and sample sizes. I have a few questions:
Although this would be a post-hoc sample calculation, does it even make sense to do a power calculation. Would the question "is the population adequately powered to allow an AUC of 0.75?" be a valid question to ask?
I apologise for this very basic question, but I am having great difficulty getting a straight answer (perhaps because my question is incorrect?)
power-analysis auc sensitivity-specificity
I am writing a grant application which will be evaluating a new diagnostic test. The test will predict whether a patient with lung fibrosis will remain stable or progress. I am using an existing cohort of patients with lung fibrosis to test this new diagnostic test. I was planning on reporting the results of the diagnostic test as AUC and sensitivity and specificity.
The cohort has 308 patients.
155 are known to have developed progressive disease
153 are known to have remained stable
Based on previous studies, I would expect the diagnostic test to achieve an AUC of at least 0.75 (80% power (ñ=0.05))
I have been told that I may be asked at interview about power calculations and sample sizes. I have a few questions:
Although this would be a post-hoc sample calculation, does it even make sense to do a power calculation. Would the question "is the population adequately powered to allow an AUC of 0.75?" be a valid question to ask?
I apologise for this very basic question, but I am having great difficulty getting a straight answer (perhaps because my question is incorrect?)
power-analysis auc sensitivity-specificity
power-analysis auc sensitivity-specificity
edited 1 hour ago
mdewey
11k72040
11k72040
asked 2 hours ago
GhostRider
20218
20218
I have edited your choice of tags since diagnostic is about regression diagnostics not diagnostic tests in healthcare.
â mdewey
1 hour ago
add a comment |Â
I have edited your choice of tags since diagnostic is about regression diagnostics not diagnostic tests in healthcare.
â mdewey
1 hour ago
I have edited your choice of tags since diagnostic is about regression diagnostics not diagnostic tests in healthcare.
â mdewey
1 hour ago
I have edited your choice of tags since diagnostic is about regression diagnostics not diagnostic tests in healthcare.
â mdewey
1 hour ago
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
It is realistic for them to ask you about whether your sample size is adequate and for what. Suppose we take the sensitivity. This is just a simple proportion so you could determine a range of sensitivities which you think are plausible, say 0.7, 0.75, 0.8, 0.85, 0.90, and then for your sample of known cases (155) calculate what the confidence interval would be about the estimated proportion. This would let them know how precise your estimate is going to be. You could do the same for specificity (of course that is just another, independent, proportion). You could also with rather more work do something similar for AUC.
The main advantage in doing this would be that it would show you are serious about what you are doing. You cannot change the size of the cohort nor alter disease prevalence so all they can do is decide they want to buy that level of confidence interval or not.
add a comment |Â
up vote
0
down vote
It's a post-hoc power calculation if you have already evaluated your diagnostic test on this cohort of 308 patients. In that case, there is no reason to perform a power calculation, even though some may nevertheless demand such useless exercises; see this page for some discussion. If the evaluation is already done, you got whatever results you got about the value of the diagnostic test.
If evaluating your diagnostic test on this cohort of 308 patients has not yet been done and is the aim of your grant application, then I think that the reviewers have an obligation to ask whether your sample size is adequate to document the value of your test, as @mdewey notes in another answer. There is at least one page on this site describing power calculations for AUC/ROC, and at least one R package to help with the calculations.
If you are limited to these 308 patients, then report what AUC you could detect as significant with that sample size (for example, with 80% power at ñ=0.05, 2-sided test) based on your experience with or expectations about the reliability of the test, and discuss whether that AUC could form the basis of a clinically useful test. The sensitivity and specificity of course depend on your particular choice of cutoff for whatever continuous measure varies along your ROC curve; you should be prepared to discuss the clinical significance of the underlying tradeoffs of positive and negative misclassifications as you vary that cutoff.
For ultimately evaluating the model you should use a proper scoring rule rather than sensitivity or accuracy, as discussed on this page. AUC is certainly better than sensitivity or accuracy, but even it has limitations. For purposes of demonstrating expected statistical power, however, the AUC is easy to explain and is a reasonable choice
Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
â GhostRider
17 mins ago
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
It is realistic for them to ask you about whether your sample size is adequate and for what. Suppose we take the sensitivity. This is just a simple proportion so you could determine a range of sensitivities which you think are plausible, say 0.7, 0.75, 0.8, 0.85, 0.90, and then for your sample of known cases (155) calculate what the confidence interval would be about the estimated proportion. This would let them know how precise your estimate is going to be. You could do the same for specificity (of course that is just another, independent, proportion). You could also with rather more work do something similar for AUC.
The main advantage in doing this would be that it would show you are serious about what you are doing. You cannot change the size of the cohort nor alter disease prevalence so all they can do is decide they want to buy that level of confidence interval or not.
add a comment |Â
up vote
2
down vote
It is realistic for them to ask you about whether your sample size is adequate and for what. Suppose we take the sensitivity. This is just a simple proportion so you could determine a range of sensitivities which you think are plausible, say 0.7, 0.75, 0.8, 0.85, 0.90, and then for your sample of known cases (155) calculate what the confidence interval would be about the estimated proportion. This would let them know how precise your estimate is going to be. You could do the same for specificity (of course that is just another, independent, proportion). You could also with rather more work do something similar for AUC.
The main advantage in doing this would be that it would show you are serious about what you are doing. You cannot change the size of the cohort nor alter disease prevalence so all they can do is decide they want to buy that level of confidence interval or not.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
It is realistic for them to ask you about whether your sample size is adequate and for what. Suppose we take the sensitivity. This is just a simple proportion so you could determine a range of sensitivities which you think are plausible, say 0.7, 0.75, 0.8, 0.85, 0.90, and then for your sample of known cases (155) calculate what the confidence interval would be about the estimated proportion. This would let them know how precise your estimate is going to be. You could do the same for specificity (of course that is just another, independent, proportion). You could also with rather more work do something similar for AUC.
The main advantage in doing this would be that it would show you are serious about what you are doing. You cannot change the size of the cohort nor alter disease prevalence so all they can do is decide they want to buy that level of confidence interval or not.
It is realistic for them to ask you about whether your sample size is adequate and for what. Suppose we take the sensitivity. This is just a simple proportion so you could determine a range of sensitivities which you think are plausible, say 0.7, 0.75, 0.8, 0.85, 0.90, and then for your sample of known cases (155) calculate what the confidence interval would be about the estimated proportion. This would let them know how precise your estimate is going to be. You could do the same for specificity (of course that is just another, independent, proportion). You could also with rather more work do something similar for AUC.
The main advantage in doing this would be that it would show you are serious about what you are doing. You cannot change the size of the cohort nor alter disease prevalence so all they can do is decide they want to buy that level of confidence interval or not.
answered 1 hour ago
mdewey
11k72040
11k72040
add a comment |Â
add a comment |Â
up vote
0
down vote
It's a post-hoc power calculation if you have already evaluated your diagnostic test on this cohort of 308 patients. In that case, there is no reason to perform a power calculation, even though some may nevertheless demand such useless exercises; see this page for some discussion. If the evaluation is already done, you got whatever results you got about the value of the diagnostic test.
If evaluating your diagnostic test on this cohort of 308 patients has not yet been done and is the aim of your grant application, then I think that the reviewers have an obligation to ask whether your sample size is adequate to document the value of your test, as @mdewey notes in another answer. There is at least one page on this site describing power calculations for AUC/ROC, and at least one R package to help with the calculations.
If you are limited to these 308 patients, then report what AUC you could detect as significant with that sample size (for example, with 80% power at ñ=0.05, 2-sided test) based on your experience with or expectations about the reliability of the test, and discuss whether that AUC could form the basis of a clinically useful test. The sensitivity and specificity of course depend on your particular choice of cutoff for whatever continuous measure varies along your ROC curve; you should be prepared to discuss the clinical significance of the underlying tradeoffs of positive and negative misclassifications as you vary that cutoff.
For ultimately evaluating the model you should use a proper scoring rule rather than sensitivity or accuracy, as discussed on this page. AUC is certainly better than sensitivity or accuracy, but even it has limitations. For purposes of demonstrating expected statistical power, however, the AUC is easy to explain and is a reasonable choice
Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
â GhostRider
17 mins ago
add a comment |Â
up vote
0
down vote
It's a post-hoc power calculation if you have already evaluated your diagnostic test on this cohort of 308 patients. In that case, there is no reason to perform a power calculation, even though some may nevertheless demand such useless exercises; see this page for some discussion. If the evaluation is already done, you got whatever results you got about the value of the diagnostic test.
If evaluating your diagnostic test on this cohort of 308 patients has not yet been done and is the aim of your grant application, then I think that the reviewers have an obligation to ask whether your sample size is adequate to document the value of your test, as @mdewey notes in another answer. There is at least one page on this site describing power calculations for AUC/ROC, and at least one R package to help with the calculations.
If you are limited to these 308 patients, then report what AUC you could detect as significant with that sample size (for example, with 80% power at ñ=0.05, 2-sided test) based on your experience with or expectations about the reliability of the test, and discuss whether that AUC could form the basis of a clinically useful test. The sensitivity and specificity of course depend on your particular choice of cutoff for whatever continuous measure varies along your ROC curve; you should be prepared to discuss the clinical significance of the underlying tradeoffs of positive and negative misclassifications as you vary that cutoff.
For ultimately evaluating the model you should use a proper scoring rule rather than sensitivity or accuracy, as discussed on this page. AUC is certainly better than sensitivity or accuracy, but even it has limitations. For purposes of demonstrating expected statistical power, however, the AUC is easy to explain and is a reasonable choice
Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
â GhostRider
17 mins ago
add a comment |Â
up vote
0
down vote
up vote
0
down vote
It's a post-hoc power calculation if you have already evaluated your diagnostic test on this cohort of 308 patients. In that case, there is no reason to perform a power calculation, even though some may nevertheless demand such useless exercises; see this page for some discussion. If the evaluation is already done, you got whatever results you got about the value of the diagnostic test.
If evaluating your diagnostic test on this cohort of 308 patients has not yet been done and is the aim of your grant application, then I think that the reviewers have an obligation to ask whether your sample size is adequate to document the value of your test, as @mdewey notes in another answer. There is at least one page on this site describing power calculations for AUC/ROC, and at least one R package to help with the calculations.
If you are limited to these 308 patients, then report what AUC you could detect as significant with that sample size (for example, with 80% power at ñ=0.05, 2-sided test) based on your experience with or expectations about the reliability of the test, and discuss whether that AUC could form the basis of a clinically useful test. The sensitivity and specificity of course depend on your particular choice of cutoff for whatever continuous measure varies along your ROC curve; you should be prepared to discuss the clinical significance of the underlying tradeoffs of positive and negative misclassifications as you vary that cutoff.
For ultimately evaluating the model you should use a proper scoring rule rather than sensitivity or accuracy, as discussed on this page. AUC is certainly better than sensitivity or accuracy, but even it has limitations. For purposes of demonstrating expected statistical power, however, the AUC is easy to explain and is a reasonable choice
It's a post-hoc power calculation if you have already evaluated your diagnostic test on this cohort of 308 patients. In that case, there is no reason to perform a power calculation, even though some may nevertheless demand such useless exercises; see this page for some discussion. If the evaluation is already done, you got whatever results you got about the value of the diagnostic test.
If evaluating your diagnostic test on this cohort of 308 patients has not yet been done and is the aim of your grant application, then I think that the reviewers have an obligation to ask whether your sample size is adequate to document the value of your test, as @mdewey notes in another answer. There is at least one page on this site describing power calculations for AUC/ROC, and at least one R package to help with the calculations.
If you are limited to these 308 patients, then report what AUC you could detect as significant with that sample size (for example, with 80% power at ñ=0.05, 2-sided test) based on your experience with or expectations about the reliability of the test, and discuss whether that AUC could form the basis of a clinically useful test. The sensitivity and specificity of course depend on your particular choice of cutoff for whatever continuous measure varies along your ROC curve; you should be prepared to discuss the clinical significance of the underlying tradeoffs of positive and negative misclassifications as you vary that cutoff.
For ultimately evaluating the model you should use a proper scoring rule rather than sensitivity or accuracy, as discussed on this page. AUC is certainly better than sensitivity or accuracy, but even it has limitations. For purposes of demonstrating expected statistical power, however, the AUC is easy to explain and is a reasonable choice
answered 24 mins ago
EdM
20k23389
20k23389
Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
â GhostRider
17 mins ago
add a comment |Â
Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
â GhostRider
17 mins ago
Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
â GhostRider
17 mins ago
Many thanks for this. May I ask if you could clarify one sentence: "If you are limited to these 308 patients, then report what AUC you could detect........based on your experience with or expectations about the reliability of the test,". Is an actual calculation I can make? Its a new test, so we have no experience, but we would hope to achieve at least an AUC of 0.75
â GhostRider
17 mins ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f369341%2fdo-you-need-to-calculate-sample-size-to-evaluate-a-new-diagnostic-test%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
I have edited your choice of tags since diagnostic is about regression diagnostics not diagnostic tests in healthcare.
â mdewey
1 hour ago