Improving spam classification with tensorflow logistic regression

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
2
down vote

favorite

I would like to classify a mail (spam = 1/ham = 0), using logistic regression. My implementation is similar to this implementation and using tensorflow.

A mail is represented as a bag-of-words vector, with each number in the vector representing how often a term appeared in a mail. The idea is to multiply that with a vector, and use the sign-function to turn regression into classification. $$y_predicted = sigma(x_i^Ttheta) $$, with $sigma = frac11 + e^-x$. To calculate the loss, I am using the l2-loss (squared loss). Since I have a lot of trainig data, regularization seems not necessary (training and testing accuracy is always very close). Still I only get a max accuracy of about 90% (both training and testing). How can I improve this?

I already tried the following:

Use regularization, L1, L2 with different strength (seems not necessary)

Use different learning rates

Use gradient descent, stochastic gradient descent and batch gradient descent (the hope is to avoid local minima in the loss-function, by introducing more variance with stochastic/batch gradient descent)

create more training data (classes were disbalanced 80/20 spam/ham), using SMOTE

Things that I could still try:

use a different loss function

Any other suggestions?

edited Aug 18 at 13:50

Sycorax

33.1k587147

asked Aug 18 at 13:14

User12547645

235

add a commentÂ |Â

up vote
2
down vote

favorite

I would like to classify a mail (spam = 1/ham = 0), using logistic regression. My implementation is similar to this implementation and using tensorflow.

I already tried the following:

Use regularization, L1, L2 with different strength (seems not necessary)

Use different learning rates

Use gradient descent, stochastic gradient descent and batch gradient descent (the hope is to avoid local minima in the loss-function, by introducing more variance with stochastic/batch gradient descent)

create more training data (classes were disbalanced 80/20 spam/ham), using SMOTE

Things that I could still try:

use a different loss function

Any other suggestions?

edited Aug 18 at 13:50

Sycorax

33.1k587147

asked Aug 18 at 13:14

User12547645

235

add a commentÂ |Â

up vote
2
down vote

favorite

I would like to classify a mail (spam = 1/ham = 0), using logistic regression. My implementation is similar to this implementation and using tensorflow.

I already tried the following:

Use regularization, L1, L2 with different strength (seems not necessary)

Use different learning rates

Use gradient descent, stochastic gradient descent and batch gradient descent (the hope is to avoid local minima in the loss-function, by introducing more variance with stochastic/batch gradient descent)

create more training data (classes were disbalanced 80/20 spam/ham), using SMOTE

Things that I could still try:

use a different loss function

Any other suggestions?

edited Aug 18 at 13:50

Sycorax

33.1k587147

asked Aug 18 at 13:14

User12547645

235

I would like to classify a mail (spam = 1/ham = 0), using logistic regression. My implementation is similar to this implementation and using tensorflow.

I already tried the following:

Use regularization, L1, L2 with different strength (seems not necessary)

Use different learning rates

Use gradient descent, stochastic gradient descent and batch gradient descent (the hope is to avoid local minima in the loss-function, by introducing more variance with stochastic/batch gradient descent)

create more training data (classes were disbalanced 80/20 spam/ham), using SMOTE

Things that I could still try:

use a different loss function

Any other suggestions?

edited Aug 18 at 13:50

Sycorax

33.1k587147

asked Aug 18 at 13:14

User12547645

235

edited Aug 18 at 13:50

Sycorax

33.1k587147

edited Aug 18 at 13:50

Sycorax

33.1k587147

edited Aug 18 at 13:50

Sycorax

33.1k587147

asked Aug 18 at 13:14

User12547645

235

asked Aug 18 at 13:14

User12547645

235

asked Aug 18 at 13:14

User12547645

235

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
5
down vote

accepted

L2 loss for logistic regression is not convex, but the cross entropy loss is. IÃ¢Â€Â™d recommend making the switch because convexity is a really nice property to have during optimization. Convexity implies that you donÃ¢Â€Â™t have to worry about local minima because they donÃ¢Â€Â™t exist by definition.

A nice discussion of the mathematics comparing the convexity of log loss to the non-convexity of L2 loss can be found here: What is happening here, when I use squared loss in logistic regression setting?

The textbook way to estimate logistic regression coefficients is called Newton-Raphson updating, but I don't believe that it is implemented in TensorFlow since second-order methods are not generally used for neural networks. However, you might improve the rate of convergence if you use SGD + classical momentum or SGD + Nesterov momentum. Nesterov momentum is especially appealing in this case: since your problem is convex, the problem is more-or-less locally quadratic, and that is the use case where Nesterov momentum really shines.

edited Aug 18 at 21:07

answered Aug 18 at 13:44

Sycorax

33.1k587147

Thank you very much for the suggestion. I will have a look into it and then repo how good a result it gave me
â€“Â User12547645
Aug 18 at 15:12

Thank you again! I am not at more than 98% accuracy for training and testing, with training still going
â€“Â User12547645
Aug 18 at 20:58

1

That sounds like a pretty nice improvement, though. Almost 10%! -- in your post, you said you were getting 90% accuracy.
â€“Â Sycorax
Aug 18 at 21:08

Yes, it is very impressive indeed! And it seems as though I still do not need any regularization, since training and testing accuracy are fairly close together
â€“Â User12547645
Aug 19 at 9:52

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f362802%2fimproving-spam-classification-with-tensorflow-logistic-regression%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
5
down vote

accepted

edited Aug 18 at 21:07

answered Aug 18 at 13:44

Sycorax

33.1k587147

Thank you very much for the suggestion. I will have a look into it and then repo how good a result it gave me
â€“Â User12547645
Aug 18 at 15:12

Thank you again! I am not at more than 98% accuracy for training and testing, with training still going
â€“Â User12547645
Aug 18 at 20:58

1

That sounds like a pretty nice improvement, though. Almost 10%! -- in your post, you said you were getting 90% accuracy.
â€“Â Sycorax
Aug 18 at 21:08

Yes, it is very impressive indeed! And it seems as though I still do not need any regularization, since training and testing accuracy are fairly close together
â€“Â User12547645
Aug 19 at 9:52

add a commentÂ |Â

up vote
5
down vote

accepted

edited Aug 18 at 21:07

answered Aug 18 at 13:44

Sycorax

33.1k587147

Thank you very much for the suggestion. I will have a look into it and then repo how good a result it gave me
â€“Â User12547645
Aug 18 at 15:12

Thank you again! I am not at more than 98% accuracy for training and testing, with training still going
â€“Â User12547645
Aug 18 at 20:58

1

That sounds like a pretty nice improvement, though. Almost 10%! -- in your post, you said you were getting 90% accuracy.
â€“Â Sycorax
Aug 18 at 21:08

Yes, it is very impressive indeed! And it seems as though I still do not need any regularization, since training and testing accuracy are fairly close together
â€“Â User12547645
Aug 19 at 9:52

add a commentÂ |Â

up vote
5
down vote

accepted

edited Aug 18 at 21:07

answered Aug 18 at 13:44

Sycorax

33.1k587147

edited Aug 18 at 21:07

answered Aug 18 at 13:44

Sycorax

33.1k587147

edited Aug 18 at 21:07

answered Aug 18 at 13:44

Sycorax

33.1k587147

answered Aug 18 at 13:44

Sycorax

33.1k587147

answered Aug 18 at 13:44

Sycorax

33.1k587147

Thank you very much for the suggestion. I will have a look into it and then repo how good a result it gave me
â€“Â User12547645
Aug 18 at 15:12

Thank you again! I am not at more than 98% accuracy for training and testing, with training still going
â€“Â User12547645
Aug 18 at 20:58

1

That sounds like a pretty nice improvement, though. Almost 10%! -- in your post, you said you were getting 90% accuracy.
â€“Â Sycorax
Aug 18 at 21:08

Yes, it is very impressive indeed! And it seems as though I still do not need any regularization, since training and testing accuracy are fairly close together
â€“Â User12547645
Aug 19 at 9:52

add a commentÂ |Â

Thank you very much for the suggestion. I will have a look into it and then repo how good a result it gave me
â€“Â User12547645
Aug 18 at 15:12

Thank you again! I am not at more than 98% accuracy for training and testing, with training still going
â€“Â User12547645
Aug 18 at 20:58

1

That sounds like a pretty nice improvement, though. Almost 10%! -- in your post, you said you were getting 90% accuracy.
â€“Â Sycorax
Aug 18 at 21:08

Yes, it is very impressive indeed! And it seems as though I still do not need any regularization, since training and testing accuracy are fairly close together
â€“Â User12547645
Aug 19 at 9:52

Thank you very much for the suggestion. I will have a look into it and then repo how good a result it gave me
â€“Â User12547645
Aug 18 at 15:12

Thank you again! I am not at more than 98% accuracy for training and testing, with training still going
â€“Â User12547645
Aug 18 at 20:58

That sounds like a pretty nice improvement, though. Almost 10%! -- in your post, you said you were getting 90% accuracy.
â€“Â Sycorax
Aug 18 at 21:08

Yes, it is very impressive indeed! And it seems as though I still do not need any regularization, since training and testing accuracy are fairly close together
â€“Â User12547645
Aug 19 at 9:52

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky