Unbalanced training data for different classes

up vote
1
down vote

favorite

What precautions do I need to take while trying to develop a CNN for classification of images if there is much more training data for one label. For example:

label1 : 1000 images
label2 : 100 images
label3 : 100 images
label4 : 100 images

Numbers will become larger later but proportion is likely to stay the same.

Thanks for your insight.

asked 5 hours ago

rnso

2448

add a commentÂ |Â

up vote
1
down vote

favorite

What precautions do I need to take while trying to develop a CNN for classification of images if there is much more training data for one label. For example:

label1 : 1000 images
label2 : 100 images
label3 : 100 images
label4 : 100 images

Numbers will become larger later but proportion is likely to stay the same.

Thanks for your insight.

asked 5 hours ago

rnso

2448

add a commentÂ |Â

up vote
1
down vote

favorite

What precautions do I need to take while trying to develop a CNN for classification of images if there is much more training data for one label. For example:

label1 : 1000 images
label2 : 100 images
label3 : 100 images
label4 : 100 images

Numbers will become larger later but proportion is likely to stay the same.

Thanks for your insight.

asked 5 hours ago

rnso

2448

What precautions do I need to take while trying to develop a CNN for classification of images if there is much more training data for one label. For example:

label1 : 1000 images
label2 : 100 images
label3 : 100 images
label4 : 100 images

Numbers will become larger later but proportion is likely to stay the same.

Thanks for your insight.

neural-network convnet image-classification image-recognition

asked 5 hours ago

rnso

2448

asked 5 hours ago

rnso

2448

asked 5 hours ago

rnso

2448

asked 5 hours ago

rnso

2448

asked 5 hours ago

rnso

2448

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
2
down vote

You can duplicate the images and add them. You can use data augmentation techniques for the labels which have less images. The below code is for Keras.

datagen = ImageDataGenerator(
 rotation_range=40,
 width_shift_range=0.2,
 height_shift_range=0.2,
 shear_range=0.2,
 zoom_range=0.2,
 horizontal_flip=True,
 fill_mode='nearest')

I hope this helps. You should not be worried about one label having more data, rather you should think how to increase data for other labels.

edited 1 hour ago

answered 1 hour ago

Danny

1016

Is vertical_flip also useful? Can it be done with this function? Also, what do you think of augmentation technique given in this post: datascience.stackexchange.com/questions/38795/â€¦ ?
â€“Â rnso
1 hour ago

I think it will work if you change the (8 x 8) according to your image size but it's always better to define a new function to suit your needs.
â€“Â Danny
1 hour ago

You should first train your model on the unbalanced training set and check your results. These may serve as a baseline for further optimization. You can try different settings as well, like making sure that your batches have at least one example of each class. What I mean is: first check whether the unbalanced classes are in fact a problem, before trying to solve it
â€“Â id-2205
1 hour ago

add a commentÂ |Â

up vote
2
down vote

The dataset you are using contains almost above 90% of the training data belonging to one single class and will greatly impact your results. This imbalance of the data is ought to generate what we call as Skew classes. Presence of skew classes is going to influence your predictions and the learned model could become one that predicts the majority class.

In order to overcome this problem, you can do the following:

Sampling: Up sample or down sample your dataset to ensure equal representations of the data.

Discarding excess data: If the data in other classes is sufficient, simply discard some data from the dominating class.

Weighting: Certain training algorithms take weights to put emphasis on the classes and could be helpful during skew classes.

This answer is based on this article. Refer it for detailed explaination.

edited 13 mins ago

answered 1 hour ago

thanatoz

937

this answer seems promising, but have a look at this post in order to improve your answer. Look especially at the Provide context for links section. You can summarize the main points of the articles you shared in case the links are removed.
â€“Â BrunoGL
42 mins ago

Thanks Bruno, I'll keep that in mind.
â€“Â thanatoz
36 mins ago

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38796%2funbalanced-training-data-for-different-classes%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

You can duplicate the images and add them. You can use data augmentation techniques for the labels which have less images. The below code is for Keras.

datagen = ImageDataGenerator(
 rotation_range=40,
 width_shift_range=0.2,
 height_shift_range=0.2,
 shear_range=0.2,
 zoom_range=0.2,
 horizontal_flip=True,
 fill_mode='nearest')

I hope this helps. You should not be worried about one label having more data, rather you should think how to increase data for other labels.

edited 1 hour ago

answered 1 hour ago

Danny

1016

Is vertical_flip also useful? Can it be done with this function? Also, what do you think of augmentation technique given in this post: datascience.stackexchange.com/questions/38795/â€¦ ?
â€“Â rnso
1 hour ago

I think it will work if you change the (8 x 8) according to your image size but it's always better to define a new function to suit your needs.
â€“Â Danny
1 hour ago

You should first train your model on the unbalanced training set and check your results. These may serve as a baseline for further optimization. You can try different settings as well, like making sure that your batches have at least one example of each class. What I mean is: first check whether the unbalanced classes are in fact a problem, before trying to solve it
â€“Â id-2205
1 hour ago

add a commentÂ |Â

up vote
2
down vote

You can duplicate the images and add them. You can use data augmentation techniques for the labels which have less images. The below code is for Keras.

datagen = ImageDataGenerator(
 rotation_range=40,
 width_shift_range=0.2,
 height_shift_range=0.2,
 shear_range=0.2,
 zoom_range=0.2,
 horizontal_flip=True,
 fill_mode='nearest')

I hope this helps. You should not be worried about one label having more data, rather you should think how to increase data for other labels.

edited 1 hour ago

answered 1 hour ago

Danny

1016

Is vertical_flip also useful? Can it be done with this function? Also, what do you think of augmentation technique given in this post: datascience.stackexchange.com/questions/38795/â€¦ ?
â€“Â rnso
1 hour ago

I think it will work if you change the (8 x 8) according to your image size but it's always better to define a new function to suit your needs.
â€“Â Danny
1 hour ago

You should first train your model on the unbalanced training set and check your results. These may serve as a baseline for further optimization. You can try different settings as well, like making sure that your batches have at least one example of each class. What I mean is: first check whether the unbalanced classes are in fact a problem, before trying to solve it
â€“Â id-2205
1 hour ago

add a commentÂ |Â

up vote
2
down vote

You can duplicate the images and add them. You can use data augmentation techniques for the labels which have less images. The below code is for Keras.

datagen = ImageDataGenerator(
 rotation_range=40,
 width_shift_range=0.2,
 height_shift_range=0.2,
 shear_range=0.2,
 zoom_range=0.2,
 horizontal_flip=True,
 fill_mode='nearest')

I hope this helps. You should not be worried about one label having more data, rather you should think how to increase data for other labels.

edited 1 hour ago

answered 1 hour ago

Danny

1016

You can duplicate the images and add them. You can use data augmentation techniques for the labels which have less images. The below code is for Keras.

datagen = ImageDataGenerator(
 rotation_range=40,
 width_shift_range=0.2,
 height_shift_range=0.2,
 shear_range=0.2,
 zoom_range=0.2,
 horizontal_flip=True,
 fill_mode='nearest')

I hope this helps. You should not be worried about one label having more data, rather you should think how to increase data for other labels.

edited 1 hour ago

answered 1 hour ago

Danny

1016

edited 1 hour ago

answered 1 hour ago

Danny

1016

answered 1 hour ago

Danny

1016

answered 1 hour ago

Danny

1016

Is vertical_flip also useful? Can it be done with this function? Also, what do you think of augmentation technique given in this post: datascience.stackexchange.com/questions/38795/â€¦ ?
â€“Â rnso
1 hour ago

I think it will work if you change the (8 x 8) according to your image size but it's always better to define a new function to suit your needs.
â€“Â Danny
1 hour ago

You should first train your model on the unbalanced training set and check your results. These may serve as a baseline for further optimization. You can try different settings as well, like making sure that your batches have at least one example of each class. What I mean is: first check whether the unbalanced classes are in fact a problem, before trying to solve it
â€“Â id-2205
1 hour ago

add a commentÂ |Â

Is vertical_flip also useful? Can it be done with this function? Also, what do you think of augmentation technique given in this post: datascience.stackexchange.com/questions/38795/â€¦ ?
â€“Â rnso
1 hour ago

I think it will work if you change the (8 x 8) according to your image size but it's always better to define a new function to suit your needs.
â€“Â Danny
1 hour ago

You should first train your model on the unbalanced training set and check your results. These may serve as a baseline for further optimization. You can try different settings as well, like making sure that your batches have at least one example of each class. What I mean is: first check whether the unbalanced classes are in fact a problem, before trying to solve it
â€“Â id-2205
1 hour ago

Is vertical_flip also useful? Can it be done with this function? Also, what do you think of augmentation technique given in this post: datascience.stackexchange.com/questions/38795/â€¦ ?
â€“Â rnso
1 hour ago

I think it will work if you change the (8 x 8) according to your image size but it's always better to define a new function to suit your needs.
â€“Â Danny
1 hour ago

You should first train your model on the unbalanced training set and check your results. These may serve as a baseline for further optimization. You can try different settings as well, like making sure that your batches have at least one example of each class. What I mean is: first check whether the unbalanced classes are in fact a problem, before trying to solve it
â€“Â id-2205
1 hour ago

add a commentÂ |Â

up vote
2
down vote

In order to overcome this problem, you can do the following:

Sampling: Up sample or down sample your dataset to ensure equal representations of the data.

Discarding excess data: If the data in other classes is sufficient, simply discard some data from the dominating class.

Weighting: Certain training algorithms take weights to put emphasis on the classes and could be helpful during skew classes.

This answer is based on this article. Refer it for detailed explaination.

edited 13 mins ago

answered 1 hour ago

thanatoz

937

this answer seems promising, but have a look at this post in order to improve your answer. Look especially at the Provide context for links section. You can summarize the main points of the articles you shared in case the links are removed.
â€“Â BrunoGL
42 mins ago

Thanks Bruno, I'll keep that in mind.
â€“Â thanatoz
36 mins ago

add a commentÂ |Â

up vote
2
down vote

In order to overcome this problem, you can do the following:

Sampling: Up sample or down sample your dataset to ensure equal representations of the data.

Discarding excess data: If the data in other classes is sufficient, simply discard some data from the dominating class.

Weighting: Certain training algorithms take weights to put emphasis on the classes and could be helpful during skew classes.

This answer is based on this article. Refer it for detailed explaination.

edited 13 mins ago

answered 1 hour ago

thanatoz

937

this answer seems promising, but have a look at this post in order to improve your answer. Look especially at the Provide context for links section. You can summarize the main points of the articles you shared in case the links are removed.
â€“Â BrunoGL
42 mins ago

Thanks Bruno, I'll keep that in mind.
â€“Â thanatoz
36 mins ago

add a commentÂ |Â

up vote
2
down vote

In order to overcome this problem, you can do the following:

Sampling: Up sample or down sample your dataset to ensure equal representations of the data.

Discarding excess data: If the data in other classes is sufficient, simply discard some data from the dominating class.

Weighting: Certain training algorithms take weights to put emphasis on the classes and could be helpful during skew classes.

This answer is based on this article. Refer it for detailed explaination.

edited 13 mins ago

answered 1 hour ago

thanatoz

937

In order to overcome this problem, you can do the following:

Sampling: Up sample or down sample your dataset to ensure equal representations of the data.

Discarding excess data: If the data in other classes is sufficient, simply discard some data from the dominating class.

Weighting: Certain training algorithms take weights to put emphasis on the classes and could be helpful during skew classes.

This answer is based on this article. Refer it for detailed explaination.

edited 13 mins ago

answered 1 hour ago

thanatoz

937

edited 13 mins ago

answered 1 hour ago

thanatoz

937

answered 1 hour ago

thanatoz

937

answered 1 hour ago

thanatoz

937

this answer seems promising, but have a look at this post in order to improve your answer. Look especially at the Provide context for links section. You can summarize the main points of the articles you shared in case the links are removed.
â€“Â BrunoGL
42 mins ago

Thanks Bruno, I'll keep that in mind.
â€“Â thanatoz
36 mins ago

add a commentÂ |Â

this answer seems promising, but have a look at this post in order to improve your answer. Look especially at the Provide context for links section. You can summarize the main points of the articles you shared in case the links are removed.
â€“Â BrunoGL
42 mins ago

Thanks Bruno, I'll keep that in mind.
â€“Â thanatoz
36 mins ago

this answer seems promising, but have a look at this post in order to improve your answer. Look especially at the Provide context for links section. You can summarize the main points of the articles you shared in case the links are removed.
â€“Â BrunoGL
42 mins ago

Thanks Bruno, I'll keep that in mind.
â€“Â thanatoz
36 mins ago

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky