Improving spam classification with tensorflow logistic regression
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
I would like to classify a mail (spam = 1/ham = 0), using logistic regression. My implementation is similar to this implementation and using tensorflow.
A mail is represented as a bag-of-words vector, with each number in the vector representing how often a term appeared in a mail. The idea is to multiply that with a vector, and use the sign-function to turn regression into classification. $$y_predicted = sigma(x_i^Ttheta) $$, with $sigma = frac11 + e^-x$. To calculate the loss, I am using the l2-loss (squared loss). Since I have a lot of trainig data, regularization seems not necessary (training and testing accuracy is always very close). Still I only get a max accuracy of about 90% (both training and testing). How can I improve this?
I already tried the following:
Use regularization, L1, L2 with different strength (seems not necessary)
Use different learning rates
Use gradient descent, stochastic gradient descent and batch gradient descent (the hope is to avoid local minima in the loss-function, by introducing more variance with stochastic/batch gradient descent)
create more training data (classes were disbalanced 80/20 spam/ham), using SMOTE
Things that I could still try:
- use a different loss function
Any other suggestions?
logistic classification accuracy loss-functions tensorflow
add a comment |Â
up vote
2
down vote
favorite
I would like to classify a mail (spam = 1/ham = 0), using logistic regression. My implementation is similar to this implementation and using tensorflow.
A mail is represented as a bag-of-words vector, with each number in the vector representing how often a term appeared in a mail. The idea is to multiply that with a vector, and use the sign-function to turn regression into classification. $$y_predicted = sigma(x_i^Ttheta) $$, with $sigma = frac11 + e^-x$. To calculate the loss, I am using the l2-loss (squared loss). Since I have a lot of trainig data, regularization seems not necessary (training and testing accuracy is always very close). Still I only get a max accuracy of about 90% (both training and testing). How can I improve this?
I already tried the following:
Use regularization, L1, L2 with different strength (seems not necessary)
Use different learning rates
Use gradient descent, stochastic gradient descent and batch gradient descent (the hope is to avoid local minima in the loss-function, by introducing more variance with stochastic/batch gradient descent)
create more training data (classes were disbalanced 80/20 spam/ham), using SMOTE
Things that I could still try:
- use a different loss function
Any other suggestions?
logistic classification accuracy loss-functions tensorflow
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I would like to classify a mail (spam = 1/ham = 0), using logistic regression. My implementation is similar to this implementation and using tensorflow.
A mail is represented as a bag-of-words vector, with each number in the vector representing how often a term appeared in a mail. The idea is to multiply that with a vector, and use the sign-function to turn regression into classification. $$y_predicted = sigma(x_i^Ttheta) $$, with $sigma = frac11 + e^-x$. To calculate the loss, I am using the l2-loss (squared loss). Since I have a lot of trainig data, regularization seems not necessary (training and testing accuracy is always very close). Still I only get a max accuracy of about 90% (both training and testing). How can I improve this?
I already tried the following:
Use regularization, L1, L2 with different strength (seems not necessary)
Use different learning rates
Use gradient descent, stochastic gradient descent and batch gradient descent (the hope is to avoid local minima in the loss-function, by introducing more variance with stochastic/batch gradient descent)
create more training data (classes were disbalanced 80/20 spam/ham), using SMOTE
Things that I could still try:
- use a different loss function
Any other suggestions?
logistic classification accuracy loss-functions tensorflow
I would like to classify a mail (spam = 1/ham = 0), using logistic regression. My implementation is similar to this implementation and using tensorflow.
A mail is represented as a bag-of-words vector, with each number in the vector representing how often a term appeared in a mail. The idea is to multiply that with a vector, and use the sign-function to turn regression into classification. $$y_predicted = sigma(x_i^Ttheta) $$, with $sigma = frac11 + e^-x$. To calculate the loss, I am using the l2-loss (squared loss). Since I have a lot of trainig data, regularization seems not necessary (training and testing accuracy is always very close). Still I only get a max accuracy of about 90% (both training and testing). How can I improve this?
I already tried the following:
Use regularization, L1, L2 with different strength (seems not necessary)
Use different learning rates
Use gradient descent, stochastic gradient descent and batch gradient descent (the hope is to avoid local minima in the loss-function, by introducing more variance with stochastic/batch gradient descent)
create more training data (classes were disbalanced 80/20 spam/ham), using SMOTE
Things that I could still try:
- use a different loss function
Any other suggestions?
logistic classification accuracy loss-functions tensorflow
edited Aug 18 at 13:50
Sycorax
33.1k587147
33.1k587147
asked Aug 18 at 13:14
User12547645
235
235
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
5
down vote
accepted
L2 loss for logistic regression is not convex, but the cross entropy loss is. IâÂÂd recommend making the switch because convexity is a really nice property to have during optimization. Convexity implies that you donâÂÂt have to worry about local minima because they donâÂÂt exist by definition.
A nice discussion of the mathematics comparing the convexity of log loss to the non-convexity of L2 loss can be found here: What is happening here, when I use squared loss in logistic regression setting?
The textbook way to estimate logistic regression coefficients is called Newton-Raphson updating, but I don't believe that it is implemented in TensorFlow since second-order methods are not generally used for neural networks. However, you might improve the rate of convergence if you use SGD + classical momentum or SGD + Nesterov momentum. Nesterov momentum is especially appealing in this case: since your problem is convex, the problem is more-or-less locally quadratic, and that is the use case where Nesterov momentum really shines.
Thank you very much for the suggestion. I will have a look into it and then repo how good a result it gave me
â User12547645
Aug 18 at 15:12
Thank you again! I am not at more than 98% accuracy for training and testing, with training still going
â User12547645
Aug 18 at 20:58
1
That sounds like a pretty nice improvement, though. Almost 10%! -- in your post, you said you were getting 90% accuracy.
â Sycorax
Aug 18 at 21:08
Yes, it is very impressive indeed! And it seems as though I still do not need any regularization, since training and testing accuracy are fairly close together
â User12547645
Aug 19 at 9:52
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
L2 loss for logistic regression is not convex, but the cross entropy loss is. IâÂÂd recommend making the switch because convexity is a really nice property to have during optimization. Convexity implies that you donâÂÂt have to worry about local minima because they donâÂÂt exist by definition.
A nice discussion of the mathematics comparing the convexity of log loss to the non-convexity of L2 loss can be found here: What is happening here, when I use squared loss in logistic regression setting?
The textbook way to estimate logistic regression coefficients is called Newton-Raphson updating, but I don't believe that it is implemented in TensorFlow since second-order methods are not generally used for neural networks. However, you might improve the rate of convergence if you use SGD + classical momentum or SGD + Nesterov momentum. Nesterov momentum is especially appealing in this case: since your problem is convex, the problem is more-or-less locally quadratic, and that is the use case where Nesterov momentum really shines.
Thank you very much for the suggestion. I will have a look into it and then repo how good a result it gave me
â User12547645
Aug 18 at 15:12
Thank you again! I am not at more than 98% accuracy for training and testing, with training still going
â User12547645
Aug 18 at 20:58
1
That sounds like a pretty nice improvement, though. Almost 10%! -- in your post, you said you were getting 90% accuracy.
â Sycorax
Aug 18 at 21:08
Yes, it is very impressive indeed! And it seems as though I still do not need any regularization, since training and testing accuracy are fairly close together
â User12547645
Aug 19 at 9:52
add a comment |Â
up vote
5
down vote
accepted
L2 loss for logistic regression is not convex, but the cross entropy loss is. IâÂÂd recommend making the switch because convexity is a really nice property to have during optimization. Convexity implies that you donâÂÂt have to worry about local minima because they donâÂÂt exist by definition.
A nice discussion of the mathematics comparing the convexity of log loss to the non-convexity of L2 loss can be found here: What is happening here, when I use squared loss in logistic regression setting?
The textbook way to estimate logistic regression coefficients is called Newton-Raphson updating, but I don't believe that it is implemented in TensorFlow since second-order methods are not generally used for neural networks. However, you might improve the rate of convergence if you use SGD + classical momentum or SGD + Nesterov momentum. Nesterov momentum is especially appealing in this case: since your problem is convex, the problem is more-or-less locally quadratic, and that is the use case where Nesterov momentum really shines.
Thank you very much for the suggestion. I will have a look into it and then repo how good a result it gave me
â User12547645
Aug 18 at 15:12
Thank you again! I am not at more than 98% accuracy for training and testing, with training still going
â User12547645
Aug 18 at 20:58
1
That sounds like a pretty nice improvement, though. Almost 10%! -- in your post, you said you were getting 90% accuracy.
â Sycorax
Aug 18 at 21:08
Yes, it is very impressive indeed! And it seems as though I still do not need any regularization, since training and testing accuracy are fairly close together
â User12547645
Aug 19 at 9:52
add a comment |Â
up vote
5
down vote
accepted
up vote
5
down vote
accepted
L2 loss for logistic regression is not convex, but the cross entropy loss is. IâÂÂd recommend making the switch because convexity is a really nice property to have during optimization. Convexity implies that you donâÂÂt have to worry about local minima because they donâÂÂt exist by definition.
A nice discussion of the mathematics comparing the convexity of log loss to the non-convexity of L2 loss can be found here: What is happening here, when I use squared loss in logistic regression setting?
The textbook way to estimate logistic regression coefficients is called Newton-Raphson updating, but I don't believe that it is implemented in TensorFlow since second-order methods are not generally used for neural networks. However, you might improve the rate of convergence if you use SGD + classical momentum or SGD + Nesterov momentum. Nesterov momentum is especially appealing in this case: since your problem is convex, the problem is more-or-less locally quadratic, and that is the use case where Nesterov momentum really shines.
L2 loss for logistic regression is not convex, but the cross entropy loss is. IâÂÂd recommend making the switch because convexity is a really nice property to have during optimization. Convexity implies that you donâÂÂt have to worry about local minima because they donâÂÂt exist by definition.
A nice discussion of the mathematics comparing the convexity of log loss to the non-convexity of L2 loss can be found here: What is happening here, when I use squared loss in logistic regression setting?
The textbook way to estimate logistic regression coefficients is called Newton-Raphson updating, but I don't believe that it is implemented in TensorFlow since second-order methods are not generally used for neural networks. However, you might improve the rate of convergence if you use SGD + classical momentum or SGD + Nesterov momentum. Nesterov momentum is especially appealing in this case: since your problem is convex, the problem is more-or-less locally quadratic, and that is the use case where Nesterov momentum really shines.
edited Aug 18 at 21:07
answered Aug 18 at 13:44
Sycorax
33.1k587147
33.1k587147
Thank you very much for the suggestion. I will have a look into it and then repo how good a result it gave me
â User12547645
Aug 18 at 15:12
Thank you again! I am not at more than 98% accuracy for training and testing, with training still going
â User12547645
Aug 18 at 20:58
1
That sounds like a pretty nice improvement, though. Almost 10%! -- in your post, you said you were getting 90% accuracy.
â Sycorax
Aug 18 at 21:08
Yes, it is very impressive indeed! And it seems as though I still do not need any regularization, since training and testing accuracy are fairly close together
â User12547645
Aug 19 at 9:52
add a comment |Â
Thank you very much for the suggestion. I will have a look into it and then repo how good a result it gave me
â User12547645
Aug 18 at 15:12
Thank you again! I am not at more than 98% accuracy for training and testing, with training still going
â User12547645
Aug 18 at 20:58
1
That sounds like a pretty nice improvement, though. Almost 10%! -- in your post, you said you were getting 90% accuracy.
â Sycorax
Aug 18 at 21:08
Yes, it is very impressive indeed! And it seems as though I still do not need any regularization, since training and testing accuracy are fairly close together
â User12547645
Aug 19 at 9:52
Thank you very much for the suggestion. I will have a look into it and then repo how good a result it gave me
â User12547645
Aug 18 at 15:12
Thank you very much for the suggestion. I will have a look into it and then repo how good a result it gave me
â User12547645
Aug 18 at 15:12
Thank you again! I am not at more than 98% accuracy for training and testing, with training still going
â User12547645
Aug 18 at 20:58
Thank you again! I am not at more than 98% accuracy for training and testing, with training still going
â User12547645
Aug 18 at 20:58
1
1
That sounds like a pretty nice improvement, though. Almost 10%! -- in your post, you said you were getting 90% accuracy.
â Sycorax
Aug 18 at 21:08
That sounds like a pretty nice improvement, though. Almost 10%! -- in your post, you said you were getting 90% accuracy.
â Sycorax
Aug 18 at 21:08
Yes, it is very impressive indeed! And it seems as though I still do not need any regularization, since training and testing accuracy are fairly close together
â User12547645
Aug 19 at 9:52
Yes, it is very impressive indeed! And it seems as though I still do not need any regularization, since training and testing accuracy are fairly close together
â User12547645
Aug 19 at 9:52
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f362802%2fimproving-spam-classification-with-tensorflow-logistic-regression%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password