Metrics to determine K in K-cross fold validation

up vote
1
down vote

favorite

Consider a scenario where the dataset in hand is quite large, let's assume 50000 samples (quite well balanced between two classes). What metrics can be used to decide the K value in a K-fold cross-validation? In other words, can a 5-fold CV be enough or should I go for a 10-fold CV?

The rule of thumb is the higher K, the better. But, putting aside the computational costs, what can be used to decide the value of K? Should we look at the overall performance, e.g. average accuracy? That is, if accuracy (5CV) ~ accuracy(10CV), we can opt for 5-fold CV?. Is the standard deviation between the performance of different folds important? That is, the lower the better?

asked 1 hour ago

NCL

312

add a commentÂ |Â

up vote
1
down vote

favorite

asked 1 hour ago

NCL

312

add a commentÂ |Â

up vote
1
down vote

favorite

asked 1 hour ago

NCL

312

cross-validation accuracy performance

asked 1 hour ago

NCL

312

asked 1 hour ago

NCL

312

asked 1 hour ago

NCL

312

asked 1 hour ago

NCL

312

asked 1 hour ago

NCL

312

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
1
down vote

First of all choosing K is basically heuristic approach. It depends on the data and model. Most of the times 5 is a good choice in my opinion. It doesn't need to much computation power and time but you need to try and see which one is better for your data. There is no free lunch!

I would suggest another CV idea for you. For example if you use 5 Fold CV (without stratifying and shuffle) basically you divide your data to 5 equal folds. The mean of equal is this: every folds have same shape. Every fold can has a different distribution. So you can choose your folds manually. Plot the distribution of target variable and try to catch same patterns for decide your folds.

Also you can select your models with different K based on a criteria. For example AIC.

answered 53 mins ago

silverstone

765

add a commentÂ |Â

up vote
1
down vote

You should ask yourself, why are we even doing cross-validation?
It's not to get a better accuracy. You're trying to get a better estimate for the accuracy (or another metric) on unseen data. You want to know how well does the model generalize.

If you try to grid search for the "best K", you're going to either waste some data, or get a worse estimate of the metric.

Wasting data - you split your data into two sets and grid search on one of them and then do a cross-validation(with the "best K") on the second dataset. Don't do this.

Getting a worse estimate - you do a grid search for the "best K" and choose the one that gets you the best result according to your chosen metric. But now you brought information that you shouldn't have. You are being too optimistic with your estimate. That's the exact opposite of what you wanted, when you started with the cross-validation. Don't do this either.

So what you should do? Pick the largest K that makes sense with the problem you are trying to solve. Don't put the computational cost aside. The computational cost should determine the K.

answered 39 mins ago

ExabytE

111

New contributor

add a commentÂ |Â

up vote
1
down vote

The rule of thumb is the higher K, the better.

I think a better rule of thumb is: The larger your dataset, the less important is $k$.

However, it is useful to have a general understanding of the impact of $k$ on the performance estimator (leaving aside computational costs):

Increasing $k$ decreases the bias because the training set better represents the data

Increasing $k$ increases the variance of the estimator because the training data sets are becoming more similar

Also note that there is no unbiased estimator for the variance of the $k$-fold CV. Together this means that there is no metric that can tell you the best $k$ if you leave computational costs aside. Some empirical studies suggest that 10 is a reasonable default.

And to be clear, $k$ is not a hyper-parameter you want to tune to find the best accuracy. If you start performing $k_2$-fold CV to find the best $k_1$ something hopefully feels wrong.

answered 31 mins ago

oW_

2,707629

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40772%2fmetrics-to-determine-k-in-k-cross-fold-validation%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
1
down vote

Also you can select your models with different K based on a criteria. For example AIC.

answered 53 mins ago

silverstone

765

add a commentÂ |Â

up vote
1
down vote

Also you can select your models with different K based on a criteria. For example AIC.

answered 53 mins ago

silverstone

765

add a commentÂ |Â

up vote
1
down vote

Also you can select your models with different K based on a criteria. For example AIC.

answered 53 mins ago

silverstone

765

Also you can select your models with different K based on a criteria. For example AIC.

answered 53 mins ago

silverstone

765

answered 53 mins ago

silverstone

765

answered 53 mins ago

silverstone

765

answered 53 mins ago

silverstone

765

add a commentÂ |Â

up vote
1
down vote

If you try to grid search for the "best K", you're going to either waste some data, or get a worse estimate of the metric.

Wasting data - you split your data into two sets and grid search on one of them and then do a cross-validation(with the "best K") on the second dataset. Don't do this.

So what you should do? Pick the largest K that makes sense with the problem you are trying to solve. Don't put the computational cost aside. The computational cost should determine the K.

answered 39 mins ago

ExabytE

111

New contributor

add a commentÂ |Â

up vote
1
down vote

If you try to grid search for the "best K", you're going to either waste some data, or get a worse estimate of the metric.

Wasting data - you split your data into two sets and grid search on one of them and then do a cross-validation(with the "best K") on the second dataset. Don't do this.

So what you should do? Pick the largest K that makes sense with the problem you are trying to solve. Don't put the computational cost aside. The computational cost should determine the K.

answered 39 mins ago

ExabytE

111

New contributor

add a commentÂ |Â

up vote
1
down vote

If you try to grid search for the "best K", you're going to either waste some data, or get a worse estimate of the metric.

Wasting data - you split your data into two sets and grid search on one of them and then do a cross-validation(with the "best K") on the second dataset. Don't do this.

So what you should do? Pick the largest K that makes sense with the problem you are trying to solve. Don't put the computational cost aside. The computational cost should determine the K.

answered 39 mins ago

ExabytE

111

New contributor

If you try to grid search for the "best K", you're going to either waste some data, or get a worse estimate of the metric.

Wasting data - you split your data into two sets and grid search on one of them and then do a cross-validation(with the "best K") on the second dataset. Don't do this.

So what you should do? Pick the largest K that makes sense with the problem you are trying to solve. Don't put the computational cost aside. The computational cost should determine the K.

answered 39 mins ago

ExabytE

111

New contributor

answered 39 mins ago

ExabytE

111

New contributor

answered 39 mins ago

ExabytE

111

answered 39 mins ago

ExabytE

111

New contributor

ExabytE is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a commentÂ |Â

up vote
1
down vote

The rule of thumb is the higher K, the better.

I think a better rule of thumb is: The larger your dataset, the less important is $k$.

However, it is useful to have a general understanding of the impact of $k$ on the performance estimator (leaving aside computational costs):

Increasing $k$ decreases the bias because the training set better represents the data

Increasing $k$ increases the variance of the estimator because the training data sets are becoming more similar

And to be clear, $k$ is not a hyper-parameter you want to tune to find the best accuracy. If you start performing $k_2$-fold CV to find the best $k_1$ something hopefully feels wrong.

answered 31 mins ago

oW_

2,707629

add a commentÂ |Â

up vote
1
down vote

The rule of thumb is the higher K, the better.

I think a better rule of thumb is: The larger your dataset, the less important is $k$.

However, it is useful to have a general understanding of the impact of $k$ on the performance estimator (leaving aside computational costs):

Increasing $k$ decreases the bias because the training set better represents the data

Increasing $k$ increases the variance of the estimator because the training data sets are becoming more similar

And to be clear, $k$ is not a hyper-parameter you want to tune to find the best accuracy. If you start performing $k_2$-fold CV to find the best $k_1$ something hopefully feels wrong.

answered 31 mins ago

oW_

2,707629

add a commentÂ |Â

up vote
1
down vote

The rule of thumb is the higher K, the better.

I think a better rule of thumb is: The larger your dataset, the less important is $k$.

However, it is useful to have a general understanding of the impact of $k$ on the performance estimator (leaving aside computational costs):

Increasing $k$ decreases the bias because the training set better represents the data

Increasing $k$ increases the variance of the estimator because the training data sets are becoming more similar

And to be clear, $k$ is not a hyper-parameter you want to tune to find the best accuracy. If you start performing $k_2$-fold CV to find the best $k_1$ something hopefully feels wrong.

answered 31 mins ago

oW_

2,707629

The rule of thumb is the higher K, the better.

I think a better rule of thumb is: The larger your dataset, the less important is $k$.

However, it is useful to have a general understanding of the impact of $k$ on the performance estimator (leaving aside computational costs):

Increasing $k$ decreases the bias because the training set better represents the data

Increasing $k$ increases the variance of the estimator because the training data sets are becoming more similar

And to be clear, $k$ is not a hyper-parameter you want to tune to find the best accuracy. If you start performing $k_2$-fold CV to find the best $k_1$ something hopefully feels wrong.

answered 31 mins ago

oW_

2,707629

answered 31 mins ago

oW_

2,707629

answered 31 mins ago

oW_

2,707629

answered 31 mins ago

oW_

2,707629

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky