Adding more layers decreases accuracy

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite
1












I have my ANN trained on MNIST dataset. Hidden layer has 128 neurons and input layer has 784 neurons. This gave me an accuracy of 94%. However when I added one more layer with 64 neurons in each then the accuracy significantly reduced to 35%. What could be the reason behind this.



Edit : Activation function : sigmoid. 521 epochs.










share|improve this question









New contributor




Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • What is the activation function you are using?
    – DuttaA
    2 hours ago










  • @DuttaA sigmoid
    – Pink
    55 mins ago














up vote
2
down vote

favorite
1












I have my ANN trained on MNIST dataset. Hidden layer has 128 neurons and input layer has 784 neurons. This gave me an accuracy of 94%. However when I added one more layer with 64 neurons in each then the accuracy significantly reduced to 35%. What could be the reason behind this.



Edit : Activation function : sigmoid. 521 epochs.










share|improve this question









New contributor




Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • What is the activation function you are using?
    – DuttaA
    2 hours ago










  • @DuttaA sigmoid
    – Pink
    55 mins ago












up vote
2
down vote

favorite
1









up vote
2
down vote

favorite
1






1





I have my ANN trained on MNIST dataset. Hidden layer has 128 neurons and input layer has 784 neurons. This gave me an accuracy of 94%. However when I added one more layer with 64 neurons in each then the accuracy significantly reduced to 35%. What could be the reason behind this.



Edit : Activation function : sigmoid. 521 epochs.










share|improve this question









New contributor




Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I have my ANN trained on MNIST dataset. Hidden layer has 128 neurons and input layer has 784 neurons. This gave me an accuracy of 94%. However when I added one more layer with 64 neurons in each then the accuracy significantly reduced to 35%. What could be the reason behind this.



Edit : Activation function : sigmoid. 521 epochs.







machine-learning neural-network deep-learning mlp






share|improve this question









New contributor




Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 1 hour ago





















New contributor




Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 5 hours ago









Pink

112




112




New contributor




Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Pink is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • What is the activation function you are using?
    – DuttaA
    2 hours ago










  • @DuttaA sigmoid
    – Pink
    55 mins ago
















  • What is the activation function you are using?
    – DuttaA
    2 hours ago










  • @DuttaA sigmoid
    – Pink
    55 mins ago















What is the activation function you are using?
– DuttaA
2 hours ago




What is the activation function you are using?
– DuttaA
2 hours ago












@DuttaA sigmoid
– Pink
55 mins ago




@DuttaA sigmoid
– Pink
55 mins ago










2 Answers
2






active

oldest

votes

















up vote
1
down vote













The reason is that by adding more layers, you've added more trainable parameter to your model. You have to train it more. You should consider that MNIST data set is a very easy-to-learn dataset. You can have to layers with much less number of neurons in each layer. Try $10$ neurons for each to facilitate the learning process. You can reach to $100%$ accuracy.






share|improve this answer






















  • It's also a very small dataset.
    – Matthieu Brucher
    35 mins ago










  • Yes! $50$ thousand is very smal for deep-learning purposes.
    – Media
    28 mins ago

















up vote
0
down vote













The problem in your case (as I thought previously) is the sigmoid activation function. It suffers from many problems. Out of that your performance decrease is likely due to two reasons:



  • Vanishing Gradient

  • High learning rate

The vanishing gradient problem makes sure your Neural Neyt is trapped in a non optimal solution. While the high learning rate ensures that you get trapped in the non optimal solution. In short the high learning rate after a few oscillations will push your network to saturation.



Solution:



  • Best solution is to use the ReLu activation function, with maybe the last layer as sigmoid.

  • Use an adaptive optimizer like AdaGrad, Adam or RMSProp.

  • Decrease the learning rate to $10^-6$ to $10^-7$ but to compensate increase the number of epochs to $10^6$ to $10^7$.





share|improve this answer




















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "557"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    Pink is a new contributor. Be nice, and check out our Code of Conduct.









     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40333%2fadding-more-layers-decreases-accuracy%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    The reason is that by adding more layers, you've added more trainable parameter to your model. You have to train it more. You should consider that MNIST data set is a very easy-to-learn dataset. You can have to layers with much less number of neurons in each layer. Try $10$ neurons for each to facilitate the learning process. You can reach to $100%$ accuracy.






    share|improve this answer






















    • It's also a very small dataset.
      – Matthieu Brucher
      35 mins ago










    • Yes! $50$ thousand is very smal for deep-learning purposes.
      – Media
      28 mins ago














    up vote
    1
    down vote













    The reason is that by adding more layers, you've added more trainable parameter to your model. You have to train it more. You should consider that MNIST data set is a very easy-to-learn dataset. You can have to layers with much less number of neurons in each layer. Try $10$ neurons for each to facilitate the learning process. You can reach to $100%$ accuracy.






    share|improve this answer






















    • It's also a very small dataset.
      – Matthieu Brucher
      35 mins ago










    • Yes! $50$ thousand is very smal for deep-learning purposes.
      – Media
      28 mins ago












    up vote
    1
    down vote










    up vote
    1
    down vote









    The reason is that by adding more layers, you've added more trainable parameter to your model. You have to train it more. You should consider that MNIST data set is a very easy-to-learn dataset. You can have to layers with much less number of neurons in each layer. Try $10$ neurons for each to facilitate the learning process. You can reach to $100%$ accuracy.






    share|improve this answer














    The reason is that by adding more layers, you've added more trainable parameter to your model. You have to train it more. You should consider that MNIST data set is a very easy-to-learn dataset. You can have to layers with much less number of neurons in each layer. Try $10$ neurons for each to facilitate the learning process. You can reach to $100%$ accuracy.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 3 hours ago

























    answered 4 hours ago









    Media

    5,95551446




    5,95551446











    • It's also a very small dataset.
      – Matthieu Brucher
      35 mins ago










    • Yes! $50$ thousand is very smal for deep-learning purposes.
      – Media
      28 mins ago
















    • It's also a very small dataset.
      – Matthieu Brucher
      35 mins ago










    • Yes! $50$ thousand is very smal for deep-learning purposes.
      – Media
      28 mins ago















    It's also a very small dataset.
    – Matthieu Brucher
    35 mins ago




    It's also a very small dataset.
    – Matthieu Brucher
    35 mins ago












    Yes! $50$ thousand is very smal for deep-learning purposes.
    – Media
    28 mins ago




    Yes! $50$ thousand is very smal for deep-learning purposes.
    – Media
    28 mins ago










    up vote
    0
    down vote













    The problem in your case (as I thought previously) is the sigmoid activation function. It suffers from many problems. Out of that your performance decrease is likely due to two reasons:



    • Vanishing Gradient

    • High learning rate

    The vanishing gradient problem makes sure your Neural Neyt is trapped in a non optimal solution. While the high learning rate ensures that you get trapped in the non optimal solution. In short the high learning rate after a few oscillations will push your network to saturation.



    Solution:



    • Best solution is to use the ReLu activation function, with maybe the last layer as sigmoid.

    • Use an adaptive optimizer like AdaGrad, Adam or RMSProp.

    • Decrease the learning rate to $10^-6$ to $10^-7$ but to compensate increase the number of epochs to $10^6$ to $10^7$.





    share|improve this answer
























      up vote
      0
      down vote













      The problem in your case (as I thought previously) is the sigmoid activation function. It suffers from many problems. Out of that your performance decrease is likely due to two reasons:



      • Vanishing Gradient

      • High learning rate

      The vanishing gradient problem makes sure your Neural Neyt is trapped in a non optimal solution. While the high learning rate ensures that you get trapped in the non optimal solution. In short the high learning rate after a few oscillations will push your network to saturation.



      Solution:



      • Best solution is to use the ReLu activation function, with maybe the last layer as sigmoid.

      • Use an adaptive optimizer like AdaGrad, Adam or RMSProp.

      • Decrease the learning rate to $10^-6$ to $10^-7$ but to compensate increase the number of epochs to $10^6$ to $10^7$.





      share|improve this answer






















        up vote
        0
        down vote










        up vote
        0
        down vote









        The problem in your case (as I thought previously) is the sigmoid activation function. It suffers from many problems. Out of that your performance decrease is likely due to two reasons:



        • Vanishing Gradient

        • High learning rate

        The vanishing gradient problem makes sure your Neural Neyt is trapped in a non optimal solution. While the high learning rate ensures that you get trapped in the non optimal solution. In short the high learning rate after a few oscillations will push your network to saturation.



        Solution:



        • Best solution is to use the ReLu activation function, with maybe the last layer as sigmoid.

        • Use an adaptive optimizer like AdaGrad, Adam or RMSProp.

        • Decrease the learning rate to $10^-6$ to $10^-7$ but to compensate increase the number of epochs to $10^6$ to $10^7$.





        share|improve this answer












        The problem in your case (as I thought previously) is the sigmoid activation function. It suffers from many problems. Out of that your performance decrease is likely due to two reasons:



        • Vanishing Gradient

        • High learning rate

        The vanishing gradient problem makes sure your Neural Neyt is trapped in a non optimal solution. While the high learning rate ensures that you get trapped in the non optimal solution. In short the high learning rate after a few oscillations will push your network to saturation.



        Solution:



        • Best solution is to use the ReLu activation function, with maybe the last layer as sigmoid.

        • Use an adaptive optimizer like AdaGrad, Adam or RMSProp.

        • Decrease the learning rate to $10^-6$ to $10^-7$ but to compensate increase the number of epochs to $10^6$ to $10^7$.






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 12 mins ago









        DuttaA

        440116




        440116




















            Pink is a new contributor. Be nice, and check out our Code of Conduct.









             

            draft saved


            draft discarded


















            Pink is a new contributor. Be nice, and check out our Code of Conduct.












            Pink is a new contributor. Be nice, and check out our Code of Conduct.











            Pink is a new contributor. Be nice, and check out our Code of Conduct.













             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40333%2fadding-more-layers-decreases-accuracy%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            What does second last employer means? [closed]

            List of Gilmore Girls characters

            Confectionery