Can we remove features that have zero-correlation with the target/label?
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
So I draw a pairplot/heatmap from the feature correlations of a dataset and see a set of features that bears Zero-correlations both with:
- every other feature and
- also with the target/label
.Reference code snippet in python is below:
corr = df.corr()
sns.heatmap(corr) # Visually see how each feature is correlate with other (incl. the target)
- Can I drop these features to improve the accuracy of my classification problem?
- Can I drop these features to improve the accuracy of my classification problem, if it is explicitly given that these features are derived features?
classification scikit-learn pandas seaborn
add a comment |Â
up vote
2
down vote
favorite
So I draw a pairplot/heatmap from the feature correlations of a dataset and see a set of features that bears Zero-correlations both with:
- every other feature and
- also with the target/label
.Reference code snippet in python is below:
corr = df.corr()
sns.heatmap(corr) # Visually see how each feature is correlate with other (incl. the target)
- Can I drop these features to improve the accuracy of my classification problem?
- Can I drop these features to improve the accuracy of my classification problem, if it is explicitly given that these features are derived features?
classification scikit-learn pandas seaborn
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
So I draw a pairplot/heatmap from the feature correlations of a dataset and see a set of features that bears Zero-correlations both with:
- every other feature and
- also with the target/label
.Reference code snippet in python is below:
corr = df.corr()
sns.heatmap(corr) # Visually see how each feature is correlate with other (incl. the target)
- Can I drop these features to improve the accuracy of my classification problem?
- Can I drop these features to improve the accuracy of my classification problem, if it is explicitly given that these features are derived features?
classification scikit-learn pandas seaborn
So I draw a pairplot/heatmap from the feature correlations of a dataset and see a set of features that bears Zero-correlations both with:
- every other feature and
- also with the target/label
.Reference code snippet in python is below:
corr = df.corr()
sns.heatmap(corr) # Visually see how each feature is correlate with other (incl. the target)
- Can I drop these features to improve the accuracy of my classification problem?
- Can I drop these features to improve the accuracy of my classification problem, if it is explicitly given that these features are derived features?
classification scikit-learn pandas seaborn
classification scikit-learn pandas seaborn
edited 10 mins ago
asked 1 hour ago
karthiks
1205
1205
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
2
down vote
accepted
Can I drop these features to improve the accuracy of my classification problem?
If you are using a simple linear classifier, such as logistic regression then yes. That is because your plots are giving you a direct visualisation of how the model could make use of the data.
As soon as you start to use a non-linear classifier, that can combine features inside the learning model, then it is not so straightforward. Your plots cannot exclude a complex relationship that such a model might be able to exploit. Generally the only way to proceed is to train and test the model (using some form of cross-validation) with and without the feature.
A plot might visually show a strong non-linear relationship with zero linear correlation - e.g. a complete bell curve of feature versus target would have close to zero linear correlation, but suggest that something interesting is going on that would be useful in a predictive model. If you see plots like this, you can either try to turn them into linear relationships with some feature engineering, or you can treat it as evidence that you should use a non-linear model.
That clears the air. Thanks. I've also added a follow-up question. Do you mind answering it as well? Thanks in advance.
– karthiks
8 mins ago
add a comment |Â
up vote
3
down vote
These uncorrelated features might be important for target in connection with other non-target features. So, it might be not a good idea to remove them, especially if your model is a complex one.
It might be a good idea to remove one of the highly correlated between themselves non-target features, because they might be redundant.
Still, it might be better to use feature reduction technics like PCA, because PCA maximize variance, without removing the whole feature, but including it into principal component.
In case of ordinals or binary features, correlation won't tell you a lot. So I guess, the best way to test if a feature is important in case it's not correlated with target is to directly compare performance of a model with and without the feature. But still different features might have different importance for different algorithms.
add a comment |Â
up vote
1
down vote
If I understand you well, you are asking if you can remove features having zero-correlation either :
- With other features
- With the label you want to predict
Those are two different cases :
1. We usually recommend to remove features having correlation between them (stabilize the model). If they are ZERO-correlated, you cannot conclude here. This is by training your model that you will see is the feature is worth or not.
Don't drop those ones.
2. If a feature is strongly correlated with your label, this means a linear function (or model) should be able to predict well the latter. Even if it is not correlated, it doesn't tell you that a non-linear model wouldn't perform well by using this feature.
Don't drop this one either !
I hope I answered your question.
New contributor
Atani is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Modified question for pressing clarity. I meant that a set of features bearing ZERO-correlation with all other features including the target/label. Hope that clarifies..
– karthiks
13 mins ago
Thank you for your clarification. Edited my answer accordingly.
– Atani
2 mins ago
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Can I drop these features to improve the accuracy of my classification problem?
If you are using a simple linear classifier, such as logistic regression then yes. That is because your plots are giving you a direct visualisation of how the model could make use of the data.
As soon as you start to use a non-linear classifier, that can combine features inside the learning model, then it is not so straightforward. Your plots cannot exclude a complex relationship that such a model might be able to exploit. Generally the only way to proceed is to train and test the model (using some form of cross-validation) with and without the feature.
A plot might visually show a strong non-linear relationship with zero linear correlation - e.g. a complete bell curve of feature versus target would have close to zero linear correlation, but suggest that something interesting is going on that would be useful in a predictive model. If you see plots like this, you can either try to turn them into linear relationships with some feature engineering, or you can treat it as evidence that you should use a non-linear model.
That clears the air. Thanks. I've also added a follow-up question. Do you mind answering it as well? Thanks in advance.
– karthiks
8 mins ago
add a comment |Â
up vote
2
down vote
accepted
Can I drop these features to improve the accuracy of my classification problem?
If you are using a simple linear classifier, such as logistic regression then yes. That is because your plots are giving you a direct visualisation of how the model could make use of the data.
As soon as you start to use a non-linear classifier, that can combine features inside the learning model, then it is not so straightforward. Your plots cannot exclude a complex relationship that such a model might be able to exploit. Generally the only way to proceed is to train and test the model (using some form of cross-validation) with and without the feature.
A plot might visually show a strong non-linear relationship with zero linear correlation - e.g. a complete bell curve of feature versus target would have close to zero linear correlation, but suggest that something interesting is going on that would be useful in a predictive model. If you see plots like this, you can either try to turn them into linear relationships with some feature engineering, or you can treat it as evidence that you should use a non-linear model.
That clears the air. Thanks. I've also added a follow-up question. Do you mind answering it as well? Thanks in advance.
– karthiks
8 mins ago
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Can I drop these features to improve the accuracy of my classification problem?
If you are using a simple linear classifier, such as logistic regression then yes. That is because your plots are giving you a direct visualisation of how the model could make use of the data.
As soon as you start to use a non-linear classifier, that can combine features inside the learning model, then it is not so straightforward. Your plots cannot exclude a complex relationship that such a model might be able to exploit. Generally the only way to proceed is to train and test the model (using some form of cross-validation) with and without the feature.
A plot might visually show a strong non-linear relationship with zero linear correlation - e.g. a complete bell curve of feature versus target would have close to zero linear correlation, but suggest that something interesting is going on that would be useful in a predictive model. If you see plots like this, you can either try to turn them into linear relationships with some feature engineering, or you can treat it as evidence that you should use a non-linear model.
Can I drop these features to improve the accuracy of my classification problem?
If you are using a simple linear classifier, such as logistic regression then yes. That is because your plots are giving you a direct visualisation of how the model could make use of the data.
As soon as you start to use a non-linear classifier, that can combine features inside the learning model, then it is not so straightforward. Your plots cannot exclude a complex relationship that such a model might be able to exploit. Generally the only way to proceed is to train and test the model (using some form of cross-validation) with and without the feature.
A plot might visually show a strong non-linear relationship with zero linear correlation - e.g. a complete bell curve of feature versus target would have close to zero linear correlation, but suggest that something interesting is going on that would be useful in a predictive model. If you see plots like this, you can either try to turn them into linear relationships with some feature engineering, or you can treat it as evidence that you should use a non-linear model.
answered 16 mins ago
Neil Slater
15.7k22758
15.7k22758
That clears the air. Thanks. I've also added a follow-up question. Do you mind answering it as well? Thanks in advance.
– karthiks
8 mins ago
add a comment |Â
That clears the air. Thanks. I've also added a follow-up question. Do you mind answering it as well? Thanks in advance.
– karthiks
8 mins ago
That clears the air. Thanks. I've also added a follow-up question. Do you mind answering it as well? Thanks in advance.
– karthiks
8 mins ago
That clears the air. Thanks. I've also added a follow-up question. Do you mind answering it as well? Thanks in advance.
– karthiks
8 mins ago
add a comment |Â
up vote
3
down vote
These uncorrelated features might be important for target in connection with other non-target features. So, it might be not a good idea to remove them, especially if your model is a complex one.
It might be a good idea to remove one of the highly correlated between themselves non-target features, because they might be redundant.
Still, it might be better to use feature reduction technics like PCA, because PCA maximize variance, without removing the whole feature, but including it into principal component.
In case of ordinals or binary features, correlation won't tell you a lot. So I guess, the best way to test if a feature is important in case it's not correlated with target is to directly compare performance of a model with and without the feature. But still different features might have different importance for different algorithms.
add a comment |Â
up vote
3
down vote
These uncorrelated features might be important for target in connection with other non-target features. So, it might be not a good idea to remove them, especially if your model is a complex one.
It might be a good idea to remove one of the highly correlated between themselves non-target features, because they might be redundant.
Still, it might be better to use feature reduction technics like PCA, because PCA maximize variance, without removing the whole feature, but including it into principal component.
In case of ordinals or binary features, correlation won't tell you a lot. So I guess, the best way to test if a feature is important in case it's not correlated with target is to directly compare performance of a model with and without the feature. But still different features might have different importance for different algorithms.
add a comment |Â
up vote
3
down vote
up vote
3
down vote
These uncorrelated features might be important for target in connection with other non-target features. So, it might be not a good idea to remove them, especially if your model is a complex one.
It might be a good idea to remove one of the highly correlated between themselves non-target features, because they might be redundant.
Still, it might be better to use feature reduction technics like PCA, because PCA maximize variance, without removing the whole feature, but including it into principal component.
In case of ordinals or binary features, correlation won't tell you a lot. So I guess, the best way to test if a feature is important in case it's not correlated with target is to directly compare performance of a model with and without the feature. But still different features might have different importance for different algorithms.
These uncorrelated features might be important for target in connection with other non-target features. So, it might be not a good idea to remove them, especially if your model is a complex one.
It might be a good idea to remove one of the highly correlated between themselves non-target features, because they might be redundant.
Still, it might be better to use feature reduction technics like PCA, because PCA maximize variance, without removing the whole feature, but including it into principal component.
In case of ordinals or binary features, correlation won't tell you a lot. So I guess, the best way to test if a feature is important in case it's not correlated with target is to directly compare performance of a model with and without the feature. But still different features might have different importance for different algorithms.
answered 21 mins ago
DmytroSytro
787
787
add a comment |Â
add a comment |Â
up vote
1
down vote
If I understand you well, you are asking if you can remove features having zero-correlation either :
- With other features
- With the label you want to predict
Those are two different cases :
1. We usually recommend to remove features having correlation between them (stabilize the model). If they are ZERO-correlated, you cannot conclude here. This is by training your model that you will see is the feature is worth or not.
Don't drop those ones.
2. If a feature is strongly correlated with your label, this means a linear function (or model) should be able to predict well the latter. Even if it is not correlated, it doesn't tell you that a non-linear model wouldn't perform well by using this feature.
Don't drop this one either !
I hope I answered your question.
New contributor
Atani is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Modified question for pressing clarity. I meant that a set of features bearing ZERO-correlation with all other features including the target/label. Hope that clarifies..
– karthiks
13 mins ago
Thank you for your clarification. Edited my answer accordingly.
– Atani
2 mins ago
add a comment |Â
up vote
1
down vote
If I understand you well, you are asking if you can remove features having zero-correlation either :
- With other features
- With the label you want to predict
Those are two different cases :
1. We usually recommend to remove features having correlation between them (stabilize the model). If they are ZERO-correlated, you cannot conclude here. This is by training your model that you will see is the feature is worth or not.
Don't drop those ones.
2. If a feature is strongly correlated with your label, this means a linear function (or model) should be able to predict well the latter. Even if it is not correlated, it doesn't tell you that a non-linear model wouldn't perform well by using this feature.
Don't drop this one either !
I hope I answered your question.
New contributor
Atani is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Modified question for pressing clarity. I meant that a set of features bearing ZERO-correlation with all other features including the target/label. Hope that clarifies..
– karthiks
13 mins ago
Thank you for your clarification. Edited my answer accordingly.
– Atani
2 mins ago
add a comment |Â
up vote
1
down vote
up vote
1
down vote
If I understand you well, you are asking if you can remove features having zero-correlation either :
- With other features
- With the label you want to predict
Those are two different cases :
1. We usually recommend to remove features having correlation between them (stabilize the model). If they are ZERO-correlated, you cannot conclude here. This is by training your model that you will see is the feature is worth or not.
Don't drop those ones.
2. If a feature is strongly correlated with your label, this means a linear function (or model) should be able to predict well the latter. Even if it is not correlated, it doesn't tell you that a non-linear model wouldn't perform well by using this feature.
Don't drop this one either !
I hope I answered your question.
New contributor
Atani is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
If I understand you well, you are asking if you can remove features having zero-correlation either :
- With other features
- With the label you want to predict
Those are two different cases :
1. We usually recommend to remove features having correlation between them (stabilize the model). If they are ZERO-correlated, you cannot conclude here. This is by training your model that you will see is the feature is worth or not.
Don't drop those ones.
2. If a feature is strongly correlated with your label, this means a linear function (or model) should be able to predict well the latter. Even if it is not correlated, it doesn't tell you that a non-linear model wouldn't perform well by using this feature.
Don't drop this one either !
I hope I answered your question.
New contributor
Atani is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 3 mins ago
New contributor
Atani is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 22 mins ago
Atani
113
113
New contributor
Atani is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Atani is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Atani is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Modified question for pressing clarity. I meant that a set of features bearing ZERO-correlation with all other features including the target/label. Hope that clarifies..
– karthiks
13 mins ago
Thank you for your clarification. Edited my answer accordingly.
– Atani
2 mins ago
add a comment |Â
Modified question for pressing clarity. I meant that a set of features bearing ZERO-correlation with all other features including the target/label. Hope that clarifies..
– karthiks
13 mins ago
Thank you for your clarification. Edited my answer accordingly.
– Atani
2 mins ago
Modified question for pressing clarity. I meant that a set of features bearing ZERO-correlation with all other features including the target/label. Hope that clarifies..
– karthiks
13 mins ago
Modified question for pressing clarity. I meant that a set of features bearing ZERO-correlation with all other features including the target/label. Hope that clarifies..
– karthiks
13 mins ago
Thank you for your clarification. Edited my answer accordingly.
– Atani
2 mins ago
Thank you for your clarification. Edited my answer accordingly.
– Atani
2 mins ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40602%2fcan-we-remove-features-that-have-zero-correlation-with-the-target-label%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password