Adding more layers decreases accuracy
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
I have my ANN trained on MNIST dataset. Hidden layer has 128 neurons and input layer has 784 neurons. This gave me an accuracy of 94%. However when I added one more layer with 64 neurons in each then the accuracy significantly reduced to 35%. What could be the reason behind this.
Edit : Activation function : sigmoid. 521 epochs.
machine-learning neural-network deep-learning mlp
New contributor
Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
2
down vote
favorite
I have my ANN trained on MNIST dataset. Hidden layer has 128 neurons and input layer has 784 neurons. This gave me an accuracy of 94%. However when I added one more layer with 64 neurons in each then the accuracy significantly reduced to 35%. What could be the reason behind this.
Edit : Activation function : sigmoid. 521 epochs.
machine-learning neural-network deep-learning mlp
New contributor
Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
What is the activation function you are using?
– DuttaA
2 hours ago
@DuttaA sigmoid
– Pink
55 mins ago
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have my ANN trained on MNIST dataset. Hidden layer has 128 neurons and input layer has 784 neurons. This gave me an accuracy of 94%. However when I added one more layer with 64 neurons in each then the accuracy significantly reduced to 35%. What could be the reason behind this.
Edit : Activation function : sigmoid. 521 epochs.
machine-learning neural-network deep-learning mlp
New contributor
Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I have my ANN trained on MNIST dataset. Hidden layer has 128 neurons and input layer has 784 neurons. This gave me an accuracy of 94%. However when I added one more layer with 64 neurons in each then the accuracy significantly reduced to 35%. What could be the reason behind this.
Edit : Activation function : sigmoid. 521 epochs.
machine-learning neural-network deep-learning mlp
machine-learning neural-network deep-learning mlp
New contributor
Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 1 hour ago
New contributor
Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 5 hours ago
Pink
112
112
New contributor
Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
What is the activation function you are using?
– DuttaA
2 hours ago
@DuttaA sigmoid
– Pink
55 mins ago
add a comment |Â
What is the activation function you are using?
– DuttaA
2 hours ago
@DuttaA sigmoid
– Pink
55 mins ago
What is the activation function you are using?
– DuttaA
2 hours ago
What is the activation function you are using?
– DuttaA
2 hours ago
@DuttaA sigmoid
– Pink
55 mins ago
@DuttaA sigmoid
– Pink
55 mins ago
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
1
down vote
The reason is that by adding more layers, you've added more trainable parameter to your model. You have to train it more. You should consider that MNIST
data set is a very easy-to-learn dataset. You can have to layers with much less number of neurons in each layer. Try $10$ neurons for each to facilitate the learning process. You can reach to $100%$ accuracy.
It's also a very small dataset.
– Matthieu Brucher
35 mins ago
Yes! $50$ thousand is very smal for deep-learning purposes.
– Media
28 mins ago
add a comment |Â
up vote
0
down vote
The problem in your case (as I thought previously) is the sigmoid activation function. It suffers from many problems. Out of that your performance decrease is likely due to two reasons:
- Vanishing Gradient
- High learning rate
The vanishing gradient problem makes sure your Neural Neyt is trapped in a non optimal solution. While the high learning rate ensures that you get trapped in the non optimal solution. In short the high learning rate after a few oscillations will push your network to saturation.
Solution:
- Best solution is to use the ReLu activation function, with maybe the last layer as sigmoid.
- Use an adaptive optimizer like AdaGrad, Adam or RMSProp.
- Decrease the learning rate to $10^-6$ to $10^-7$ but to compensate increase the number of epochs to $10^6$ to $10^7$.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
The reason is that by adding more layers, you've added more trainable parameter to your model. You have to train it more. You should consider that MNIST
data set is a very easy-to-learn dataset. You can have to layers with much less number of neurons in each layer. Try $10$ neurons for each to facilitate the learning process. You can reach to $100%$ accuracy.
It's also a very small dataset.
– Matthieu Brucher
35 mins ago
Yes! $50$ thousand is very smal for deep-learning purposes.
– Media
28 mins ago
add a comment |Â
up vote
1
down vote
The reason is that by adding more layers, you've added more trainable parameter to your model. You have to train it more. You should consider that MNIST
data set is a very easy-to-learn dataset. You can have to layers with much less number of neurons in each layer. Try $10$ neurons for each to facilitate the learning process. You can reach to $100%$ accuracy.
It's also a very small dataset.
– Matthieu Brucher
35 mins ago
Yes! $50$ thousand is very smal for deep-learning purposes.
– Media
28 mins ago
add a comment |Â
up vote
1
down vote
up vote
1
down vote
The reason is that by adding more layers, you've added more trainable parameter to your model. You have to train it more. You should consider that MNIST
data set is a very easy-to-learn dataset. You can have to layers with much less number of neurons in each layer. Try $10$ neurons for each to facilitate the learning process. You can reach to $100%$ accuracy.
The reason is that by adding more layers, you've added more trainable parameter to your model. You have to train it more. You should consider that MNIST
data set is a very easy-to-learn dataset. You can have to layers with much less number of neurons in each layer. Try $10$ neurons for each to facilitate the learning process. You can reach to $100%$ accuracy.
edited 3 hours ago
answered 4 hours ago


Media
5,95551446
5,95551446
It's also a very small dataset.
– Matthieu Brucher
35 mins ago
Yes! $50$ thousand is very smal for deep-learning purposes.
– Media
28 mins ago
add a comment |Â
It's also a very small dataset.
– Matthieu Brucher
35 mins ago
Yes! $50$ thousand is very smal for deep-learning purposes.
– Media
28 mins ago
It's also a very small dataset.
– Matthieu Brucher
35 mins ago
It's also a very small dataset.
– Matthieu Brucher
35 mins ago
Yes! $50$ thousand is very smal for deep-learning purposes.
– Media
28 mins ago
Yes! $50$ thousand is very smal for deep-learning purposes.
– Media
28 mins ago
add a comment |Â
up vote
0
down vote
The problem in your case (as I thought previously) is the sigmoid activation function. It suffers from many problems. Out of that your performance decrease is likely due to two reasons:
- Vanishing Gradient
- High learning rate
The vanishing gradient problem makes sure your Neural Neyt is trapped in a non optimal solution. While the high learning rate ensures that you get trapped in the non optimal solution. In short the high learning rate after a few oscillations will push your network to saturation.
Solution:
- Best solution is to use the ReLu activation function, with maybe the last layer as sigmoid.
- Use an adaptive optimizer like AdaGrad, Adam or RMSProp.
- Decrease the learning rate to $10^-6$ to $10^-7$ but to compensate increase the number of epochs to $10^6$ to $10^7$.
add a comment |Â
up vote
0
down vote
The problem in your case (as I thought previously) is the sigmoid activation function. It suffers from many problems. Out of that your performance decrease is likely due to two reasons:
- Vanishing Gradient
- High learning rate
The vanishing gradient problem makes sure your Neural Neyt is trapped in a non optimal solution. While the high learning rate ensures that you get trapped in the non optimal solution. In short the high learning rate after a few oscillations will push your network to saturation.
Solution:
- Best solution is to use the ReLu activation function, with maybe the last layer as sigmoid.
- Use an adaptive optimizer like AdaGrad, Adam or RMSProp.
- Decrease the learning rate to $10^-6$ to $10^-7$ but to compensate increase the number of epochs to $10^6$ to $10^7$.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
The problem in your case (as I thought previously) is the sigmoid activation function. It suffers from many problems. Out of that your performance decrease is likely due to two reasons:
- Vanishing Gradient
- High learning rate
The vanishing gradient problem makes sure your Neural Neyt is trapped in a non optimal solution. While the high learning rate ensures that you get trapped in the non optimal solution. In short the high learning rate after a few oscillations will push your network to saturation.
Solution:
- Best solution is to use the ReLu activation function, with maybe the last layer as sigmoid.
- Use an adaptive optimizer like AdaGrad, Adam or RMSProp.
- Decrease the learning rate to $10^-6$ to $10^-7$ but to compensate increase the number of epochs to $10^6$ to $10^7$.
The problem in your case (as I thought previously) is the sigmoid activation function. It suffers from many problems. Out of that your performance decrease is likely due to two reasons:
- Vanishing Gradient
- High learning rate
The vanishing gradient problem makes sure your Neural Neyt is trapped in a non optimal solution. While the high learning rate ensures that you get trapped in the non optimal solution. In short the high learning rate after a few oscillations will push your network to saturation.
Solution:
- Best solution is to use the ReLu activation function, with maybe the last layer as sigmoid.
- Use an adaptive optimizer like AdaGrad, Adam or RMSProp.
- Decrease the learning rate to $10^-6$ to $10^-7$ but to compensate increase the number of epochs to $10^6$ to $10^7$.
answered 12 mins ago


DuttaA
440116
440116
add a comment |Â
add a comment |Â
Pink is a new contributor. Be nice, and check out our Code of Conduct.
Pink is a new contributor. Be nice, and check out our Code of Conduct.
Pink is a new contributor. Be nice, and check out our Code of Conduct.
Pink is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40333%2fadding-more-layers-decreases-accuracy%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
What is the activation function you are using?
– DuttaA
2 hours ago
@DuttaA sigmoid
– Pink
55 mins ago