Unbalanced training data for different classes
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
What precautions do I need to take while trying to develop a CNN for classification of images if there is much more training data for one label. For example:
label1 : 1000 images
label2 : 100 images
label3 : 100 images
label4 : 100 images
Numbers will become larger later but proportion is likely to stay the same.
Thanks for your insight.
neural-network convnet image-classification image-recognition
add a comment |Â
up vote
1
down vote
favorite
What precautions do I need to take while trying to develop a CNN for classification of images if there is much more training data for one label. For example:
label1 : 1000 images
label2 : 100 images
label3 : 100 images
label4 : 100 images
Numbers will become larger later but proportion is likely to stay the same.
Thanks for your insight.
neural-network convnet image-classification image-recognition
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
What precautions do I need to take while trying to develop a CNN for classification of images if there is much more training data for one label. For example:
label1 : 1000 images
label2 : 100 images
label3 : 100 images
label4 : 100 images
Numbers will become larger later but proportion is likely to stay the same.
Thanks for your insight.
neural-network convnet image-classification image-recognition
What precautions do I need to take while trying to develop a CNN for classification of images if there is much more training data for one label. For example:
label1 : 1000 images
label2 : 100 images
label3 : 100 images
label4 : 100 images
Numbers will become larger later but proportion is likely to stay the same.
Thanks for your insight.
neural-network convnet image-classification image-recognition
neural-network convnet image-classification image-recognition
asked 5 hours ago
rnso
2448
2448
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
You can duplicate the images and add them. You can use data augmentation techniques for the labels which have less images. The below code is for Keras.
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
I hope this helps. You should not be worried about one label having more data, rather you should think how to increase data for other labels.
Isvertical_flip
also useful? Can it be done with this function? Also, what do you think of augmentation technique given in this post: datascience.stackexchange.com/questions/38795/… ?
– rnso
1 hour ago
I think it will work if you change the (8 x 8) according to your image size but it's always better to define a new function to suit your needs.
– Danny
1 hour ago
You should first train your model on the unbalanced training set and check your results. These may serve as a baseline for further optimization. You can try different settings as well, like making sure that your batches have at least one example of each class. What I mean is: first check whether the unbalanced classes are in fact a problem, before trying to solve it
– id-2205
1 hour ago
add a comment |Â
up vote
2
down vote
The dataset you are using contains almost above 90% of the training data belonging to one single class and will greatly impact your results. This imbalance of the data is ought to generate what we call as Skew classes. Presence of skew classes is going to influence your predictions and the learned model could become one that predicts the majority class.
In order to overcome this problem, you can do the following:
Sampling: Up sample or down sample your dataset to ensure equal representations of the data.
Discarding excess data: If the data in other classes is sufficient, simply discard some data from the dominating class.
Weighting: Certain training algorithms take weights to put emphasis on the classes and could be helpful during skew classes.
This answer is based on this article. Refer it for detailed explaination.
this answer seems promising, but have a look at this post in order to improve your answer. Look especially at the Provide context for links section. You can summarize the main points of the articles you shared in case the links are removed.
– BrunoGL
42 mins ago
Thanks Bruno, I'll keep that in mind.
– thanatoz
36 mins ago
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
You can duplicate the images and add them. You can use data augmentation techniques for the labels which have less images. The below code is for Keras.
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
I hope this helps. You should not be worried about one label having more data, rather you should think how to increase data for other labels.
Isvertical_flip
also useful? Can it be done with this function? Also, what do you think of augmentation technique given in this post: datascience.stackexchange.com/questions/38795/… ?
– rnso
1 hour ago
I think it will work if you change the (8 x 8) according to your image size but it's always better to define a new function to suit your needs.
– Danny
1 hour ago
You should first train your model on the unbalanced training set and check your results. These may serve as a baseline for further optimization. You can try different settings as well, like making sure that your batches have at least one example of each class. What I mean is: first check whether the unbalanced classes are in fact a problem, before trying to solve it
– id-2205
1 hour ago
add a comment |Â
up vote
2
down vote
You can duplicate the images and add them. You can use data augmentation techniques for the labels which have less images. The below code is for Keras.
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
I hope this helps. You should not be worried about one label having more data, rather you should think how to increase data for other labels.
Isvertical_flip
also useful? Can it be done with this function? Also, what do you think of augmentation technique given in this post: datascience.stackexchange.com/questions/38795/… ?
– rnso
1 hour ago
I think it will work if you change the (8 x 8) according to your image size but it's always better to define a new function to suit your needs.
– Danny
1 hour ago
You should first train your model on the unbalanced training set and check your results. These may serve as a baseline for further optimization. You can try different settings as well, like making sure that your batches have at least one example of each class. What I mean is: first check whether the unbalanced classes are in fact a problem, before trying to solve it
– id-2205
1 hour ago
add a comment |Â
up vote
2
down vote
up vote
2
down vote
You can duplicate the images and add them. You can use data augmentation techniques for the labels which have less images. The below code is for Keras.
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
I hope this helps. You should not be worried about one label having more data, rather you should think how to increase data for other labels.
You can duplicate the images and add them. You can use data augmentation techniques for the labels which have less images. The below code is for Keras.
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
I hope this helps. You should not be worried about one label having more data, rather you should think how to increase data for other labels.
edited 1 hour ago
answered 1 hour ago
Danny
1016
1016
Isvertical_flip
also useful? Can it be done with this function? Also, what do you think of augmentation technique given in this post: datascience.stackexchange.com/questions/38795/… ?
– rnso
1 hour ago
I think it will work if you change the (8 x 8) according to your image size but it's always better to define a new function to suit your needs.
– Danny
1 hour ago
You should first train your model on the unbalanced training set and check your results. These may serve as a baseline for further optimization. You can try different settings as well, like making sure that your batches have at least one example of each class. What I mean is: first check whether the unbalanced classes are in fact a problem, before trying to solve it
– id-2205
1 hour ago
add a comment |Â
Isvertical_flip
also useful? Can it be done with this function? Also, what do you think of augmentation technique given in this post: datascience.stackexchange.com/questions/38795/… ?
– rnso
1 hour ago
I think it will work if you change the (8 x 8) according to your image size but it's always better to define a new function to suit your needs.
– Danny
1 hour ago
You should first train your model on the unbalanced training set and check your results. These may serve as a baseline for further optimization. You can try different settings as well, like making sure that your batches have at least one example of each class. What I mean is: first check whether the unbalanced classes are in fact a problem, before trying to solve it
– id-2205
1 hour ago
Is
vertical_flip
also useful? Can it be done with this function? Also, what do you think of augmentation technique given in this post: datascience.stackexchange.com/questions/38795/… ?– rnso
1 hour ago
Is
vertical_flip
also useful? Can it be done with this function? Also, what do you think of augmentation technique given in this post: datascience.stackexchange.com/questions/38795/… ?– rnso
1 hour ago
I think it will work if you change the (8 x 8) according to your image size but it's always better to define a new function to suit your needs.
– Danny
1 hour ago
I think it will work if you change the (8 x 8) according to your image size but it's always better to define a new function to suit your needs.
– Danny
1 hour ago
You should first train your model on the unbalanced training set and check your results. These may serve as a baseline for further optimization. You can try different settings as well, like making sure that your batches have at least one example of each class. What I mean is: first check whether the unbalanced classes are in fact a problem, before trying to solve it
– id-2205
1 hour ago
You should first train your model on the unbalanced training set and check your results. These may serve as a baseline for further optimization. You can try different settings as well, like making sure that your batches have at least one example of each class. What I mean is: first check whether the unbalanced classes are in fact a problem, before trying to solve it
– id-2205
1 hour ago
add a comment |Â
up vote
2
down vote
The dataset you are using contains almost above 90% of the training data belonging to one single class and will greatly impact your results. This imbalance of the data is ought to generate what we call as Skew classes. Presence of skew classes is going to influence your predictions and the learned model could become one that predicts the majority class.
In order to overcome this problem, you can do the following:
Sampling: Up sample or down sample your dataset to ensure equal representations of the data.
Discarding excess data: If the data in other classes is sufficient, simply discard some data from the dominating class.
Weighting: Certain training algorithms take weights to put emphasis on the classes and could be helpful during skew classes.
This answer is based on this article. Refer it for detailed explaination.
this answer seems promising, but have a look at this post in order to improve your answer. Look especially at the Provide context for links section. You can summarize the main points of the articles you shared in case the links are removed.
– BrunoGL
42 mins ago
Thanks Bruno, I'll keep that in mind.
– thanatoz
36 mins ago
add a comment |Â
up vote
2
down vote
The dataset you are using contains almost above 90% of the training data belonging to one single class and will greatly impact your results. This imbalance of the data is ought to generate what we call as Skew classes. Presence of skew classes is going to influence your predictions and the learned model could become one that predicts the majority class.
In order to overcome this problem, you can do the following:
Sampling: Up sample or down sample your dataset to ensure equal representations of the data.
Discarding excess data: If the data in other classes is sufficient, simply discard some data from the dominating class.
Weighting: Certain training algorithms take weights to put emphasis on the classes and could be helpful during skew classes.
This answer is based on this article. Refer it for detailed explaination.
this answer seems promising, but have a look at this post in order to improve your answer. Look especially at the Provide context for links section. You can summarize the main points of the articles you shared in case the links are removed.
– BrunoGL
42 mins ago
Thanks Bruno, I'll keep that in mind.
– thanatoz
36 mins ago
add a comment |Â
up vote
2
down vote
up vote
2
down vote
The dataset you are using contains almost above 90% of the training data belonging to one single class and will greatly impact your results. This imbalance of the data is ought to generate what we call as Skew classes. Presence of skew classes is going to influence your predictions and the learned model could become one that predicts the majority class.
In order to overcome this problem, you can do the following:
Sampling: Up sample or down sample your dataset to ensure equal representations of the data.
Discarding excess data: If the data in other classes is sufficient, simply discard some data from the dominating class.
Weighting: Certain training algorithms take weights to put emphasis on the classes and could be helpful during skew classes.
This answer is based on this article. Refer it for detailed explaination.
The dataset you are using contains almost above 90% of the training data belonging to one single class and will greatly impact your results. This imbalance of the data is ought to generate what we call as Skew classes. Presence of skew classes is going to influence your predictions and the learned model could become one that predicts the majority class.
In order to overcome this problem, you can do the following:
Sampling: Up sample or down sample your dataset to ensure equal representations of the data.
Discarding excess data: If the data in other classes is sufficient, simply discard some data from the dominating class.
Weighting: Certain training algorithms take weights to put emphasis on the classes and could be helpful during skew classes.
This answer is based on this article. Refer it for detailed explaination.
edited 13 mins ago
answered 1 hour ago
thanatoz
937
937
this answer seems promising, but have a look at this post in order to improve your answer. Look especially at the Provide context for links section. You can summarize the main points of the articles you shared in case the links are removed.
– BrunoGL
42 mins ago
Thanks Bruno, I'll keep that in mind.
– thanatoz
36 mins ago
add a comment |Â
this answer seems promising, but have a look at this post in order to improve your answer. Look especially at the Provide context for links section. You can summarize the main points of the articles you shared in case the links are removed.
– BrunoGL
42 mins ago
Thanks Bruno, I'll keep that in mind.
– thanatoz
36 mins ago
this answer seems promising, but have a look at this post in order to improve your answer. Look especially at the Provide context for links section. You can summarize the main points of the articles you shared in case the links are removed.
– BrunoGL
42 mins ago
this answer seems promising, but have a look at this post in order to improve your answer. Look especially at the Provide context for links section. You can summarize the main points of the articles you shared in case the links are removed.
– BrunoGL
42 mins ago
Thanks Bruno, I'll keep that in mind.
– thanatoz
36 mins ago
Thanks Bruno, I'll keep that in mind.
– thanatoz
36 mins ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38796%2funbalanced-training-data-for-different-classes%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password