1x1 Convolution. How does the math work?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












So I stumbled upon Andrew Ng course on 1x1 convolutions.
There he explains that you can use 1x1x192 convolution to shrink it.



But when I do



input_ = torch.randn([28, 28, 192])
filter = torch.zeros([1, 1, 192])

out = torch.mul(input_,filter)


I obviously get 28x28x192 matrix. So how should I be able to shrink it?
Just add the result of every 1x1x192 * 1x1x192 kerner result? So I'd get 28x28x1 matrix?










share|improve this question







New contributor




Mihkel L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.























    up vote
    2
    down vote

    favorite












    So I stumbled upon Andrew Ng course on 1x1 convolutions.
    There he explains that you can use 1x1x192 convolution to shrink it.



    But when I do



    input_ = torch.randn([28, 28, 192])
    filter = torch.zeros([1, 1, 192])

    out = torch.mul(input_,filter)


    I obviously get 28x28x192 matrix. So how should I be able to shrink it?
    Just add the result of every 1x1x192 * 1x1x192 kerner result? So I'd get 28x28x1 matrix?










    share|improve this question







    New contributor




    Mihkel L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





















      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      So I stumbled upon Andrew Ng course on 1x1 convolutions.
      There he explains that you can use 1x1x192 convolution to shrink it.



      But when I do



      input_ = torch.randn([28, 28, 192])
      filter = torch.zeros([1, 1, 192])

      out = torch.mul(input_,filter)


      I obviously get 28x28x192 matrix. So how should I be able to shrink it?
      Just add the result of every 1x1x192 * 1x1x192 kerner result? So I'd get 28x28x1 matrix?










      share|improve this question







      New contributor




      Mihkel L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      So I stumbled upon Andrew Ng course on 1x1 convolutions.
      There he explains that you can use 1x1x192 convolution to shrink it.



      But when I do



      input_ = torch.randn([28, 28, 192])
      filter = torch.zeros([1, 1, 192])

      out = torch.mul(input_,filter)


      I obviously get 28x28x192 matrix. So how should I be able to shrink it?
      Just add the result of every 1x1x192 * 1x1x192 kerner result? So I'd get 28x28x1 matrix?







      convnet






      share|improve this question







      New contributor




      Mihkel L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Mihkel L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Mihkel L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 3 hours ago









      Mihkel L.

      1111




      1111




      New contributor




      Mihkel L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Mihkel L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Mihkel L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          2
          down vote













          Let's go back at normal convolution: let's say you have a 28x28x3 image (3 = R,G,B).



          I don't use torch, but keras, but the principle applies I think.



          When you apply a 2D Convolution, passing the size of the filter, for example 3x3, the framework adapt your filter from 3x3 to 3x3x3! Where the last 3 it's due to the dept of the image.



          The same happens when, after a first layer of convolution with 100 filters, you obtain an image of size 28x28x100, at the second convolution layer you decide only the first two dimension of the filter, let's say 4x4. The framework instead, applies a filter of dimension 4x4x100!



          So, to reply at your question, if you apply 1x1 convolution to 28x28x100, passing number of filters of k. You obtain an activation map (result) of dimension 28x28xk.



          And that's the shrink suggested by Ng.



          Again to fully reply to your question, the mat is simple, just apply the theory of the convolution using 3D filters. Sum of multiplication of overlapping elements between filter and image.






          share|improve this answer








          New contributor




          Francesco Pegoraro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.
























            up vote
            1
            down vote













            In your example you use one filter, the video suggests to instead use 32 filters.



            Let's take a deeper look: Each 1x1 convolutional filter performs actions only locally, taking into account one pixel, with 192 channels. Its output will be one value for each pixel, thus effectively reducing the dimension to 28x28x1. Now you don't only want to have this one channel, but instead 32 - the solution to that problem is to simply take 32 filters, where each channel-dimension in the resulting 28x28x32 matrix corresponds to one 1x1 conv. filter.



            Mathematically: Let $MinmathbbR^28times28times192$ be the pixel Matrix. Now we can define a 1x1 convolutional filter, let's call it $phi:mathbbR^28times28times192tomathbbR^28times28times1$, with $phi:x_i,jmapsto g(x_i,j)$ for a $g:mathbbR^192tomathbbR$. I do not think $g$ needs to be linear, but I guess it depends on how you want to see the convolution - I would not assume linearity. Consequently we can now define a $psi:mathbbR^28times28times192tomathbbR^28times28times32$ using $psi:xmapstoleft(phi_1(x), dots, phi_32(x)right)^T$. If we now apply $psi$ on $M$, we will achieve the desired dimensionality reduction to $28times28times32$.






            share|improve this answer






















            • Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
              – Mihkel L.
              2 hours ago










            • I edited my answer :)
              – André
              1 hour ago










            • Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side of g.
              – Mihkel L.
              37 mins ago











            • g could be just adding the 192 results together or should there be more smarts behind it?
              – Mihkel L.
              30 mins ago










            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );






            Mihkel L. is a new contributor. Be nice, and check out our Code of Conduct.









             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38643%2f1x1-convolution-how-does-the-math-work%23new-answer', 'question_page');

            );

            Post as a guest






























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            2
            down vote













            Let's go back at normal convolution: let's say you have a 28x28x3 image (3 = R,G,B).



            I don't use torch, but keras, but the principle applies I think.



            When you apply a 2D Convolution, passing the size of the filter, for example 3x3, the framework adapt your filter from 3x3 to 3x3x3! Where the last 3 it's due to the dept of the image.



            The same happens when, after a first layer of convolution with 100 filters, you obtain an image of size 28x28x100, at the second convolution layer you decide only the first two dimension of the filter, let's say 4x4. The framework instead, applies a filter of dimension 4x4x100!



            So, to reply at your question, if you apply 1x1 convolution to 28x28x100, passing number of filters of k. You obtain an activation map (result) of dimension 28x28xk.



            And that's the shrink suggested by Ng.



            Again to fully reply to your question, the mat is simple, just apply the theory of the convolution using 3D filters. Sum of multiplication of overlapping elements between filter and image.






            share|improve this answer








            New contributor




            Francesco Pegoraro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.





















              up vote
              2
              down vote













              Let's go back at normal convolution: let's say you have a 28x28x3 image (3 = R,G,B).



              I don't use torch, but keras, but the principle applies I think.



              When you apply a 2D Convolution, passing the size of the filter, for example 3x3, the framework adapt your filter from 3x3 to 3x3x3! Where the last 3 it's due to the dept of the image.



              The same happens when, after a first layer of convolution with 100 filters, you obtain an image of size 28x28x100, at the second convolution layer you decide only the first two dimension of the filter, let's say 4x4. The framework instead, applies a filter of dimension 4x4x100!



              So, to reply at your question, if you apply 1x1 convolution to 28x28x100, passing number of filters of k. You obtain an activation map (result) of dimension 28x28xk.



              And that's the shrink suggested by Ng.



              Again to fully reply to your question, the mat is simple, just apply the theory of the convolution using 3D filters. Sum of multiplication of overlapping elements between filter and image.






              share|improve this answer








              New contributor




              Francesco Pegoraro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.



















                up vote
                2
                down vote










                up vote
                2
                down vote









                Let's go back at normal convolution: let's say you have a 28x28x3 image (3 = R,G,B).



                I don't use torch, but keras, but the principle applies I think.



                When you apply a 2D Convolution, passing the size of the filter, for example 3x3, the framework adapt your filter from 3x3 to 3x3x3! Where the last 3 it's due to the dept of the image.



                The same happens when, after a first layer of convolution with 100 filters, you obtain an image of size 28x28x100, at the second convolution layer you decide only the first two dimension of the filter, let's say 4x4. The framework instead, applies a filter of dimension 4x4x100!



                So, to reply at your question, if you apply 1x1 convolution to 28x28x100, passing number of filters of k. You obtain an activation map (result) of dimension 28x28xk.



                And that's the shrink suggested by Ng.



                Again to fully reply to your question, the mat is simple, just apply the theory of the convolution using 3D filters. Sum of multiplication of overlapping elements between filter and image.






                share|improve this answer








                New contributor




                Francesco Pegoraro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                Let's go back at normal convolution: let's say you have a 28x28x3 image (3 = R,G,B).



                I don't use torch, but keras, but the principle applies I think.



                When you apply a 2D Convolution, passing the size of the filter, for example 3x3, the framework adapt your filter from 3x3 to 3x3x3! Where the last 3 it's due to the dept of the image.



                The same happens when, after a first layer of convolution with 100 filters, you obtain an image of size 28x28x100, at the second convolution layer you decide only the first two dimension of the filter, let's say 4x4. The framework instead, applies a filter of dimension 4x4x100!



                So, to reply at your question, if you apply 1x1 convolution to 28x28x100, passing number of filters of k. You obtain an activation map (result) of dimension 28x28xk.



                And that's the shrink suggested by Ng.



                Again to fully reply to your question, the mat is simple, just apply the theory of the convolution using 3D filters. Sum of multiplication of overlapping elements between filter and image.







                share|improve this answer








                New contributor




                Francesco Pegoraro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                share|improve this answer



                share|improve this answer






                New contributor




                Francesco Pegoraro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                answered 1 hour ago









                Francesco Pegoraro

                765




                765




                New contributor




                Francesco Pegoraro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





                New contributor





                Francesco Pegoraro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                Francesco Pegoraro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.




















                    up vote
                    1
                    down vote













                    In your example you use one filter, the video suggests to instead use 32 filters.



                    Let's take a deeper look: Each 1x1 convolutional filter performs actions only locally, taking into account one pixel, with 192 channels. Its output will be one value for each pixel, thus effectively reducing the dimension to 28x28x1. Now you don't only want to have this one channel, but instead 32 - the solution to that problem is to simply take 32 filters, where each channel-dimension in the resulting 28x28x32 matrix corresponds to one 1x1 conv. filter.



                    Mathematically: Let $MinmathbbR^28times28times192$ be the pixel Matrix. Now we can define a 1x1 convolutional filter, let's call it $phi:mathbbR^28times28times192tomathbbR^28times28times1$, with $phi:x_i,jmapsto g(x_i,j)$ for a $g:mathbbR^192tomathbbR$. I do not think $g$ needs to be linear, but I guess it depends on how you want to see the convolution - I would not assume linearity. Consequently we can now define a $psi:mathbbR^28times28times192tomathbbR^28times28times32$ using $psi:xmapstoleft(phi_1(x), dots, phi_32(x)right)^T$. If we now apply $psi$ on $M$, we will achieve the desired dimensionality reduction to $28times28times32$.






                    share|improve this answer






















                    • Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
                      – Mihkel L.
                      2 hours ago










                    • I edited my answer :)
                      – André
                      1 hour ago










                    • Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side of g.
                      – Mihkel L.
                      37 mins ago











                    • g could be just adding the 192 results together or should there be more smarts behind it?
                      – Mihkel L.
                      30 mins ago














                    up vote
                    1
                    down vote













                    In your example you use one filter, the video suggests to instead use 32 filters.



                    Let's take a deeper look: Each 1x1 convolutional filter performs actions only locally, taking into account one pixel, with 192 channels. Its output will be one value for each pixel, thus effectively reducing the dimension to 28x28x1. Now you don't only want to have this one channel, but instead 32 - the solution to that problem is to simply take 32 filters, where each channel-dimension in the resulting 28x28x32 matrix corresponds to one 1x1 conv. filter.



                    Mathematically: Let $MinmathbbR^28times28times192$ be the pixel Matrix. Now we can define a 1x1 convolutional filter, let's call it $phi:mathbbR^28times28times192tomathbbR^28times28times1$, with $phi:x_i,jmapsto g(x_i,j)$ for a $g:mathbbR^192tomathbbR$. I do not think $g$ needs to be linear, but I guess it depends on how you want to see the convolution - I would not assume linearity. Consequently we can now define a $psi:mathbbR^28times28times192tomathbbR^28times28times32$ using $psi:xmapstoleft(phi_1(x), dots, phi_32(x)right)^T$. If we now apply $psi$ on $M$, we will achieve the desired dimensionality reduction to $28times28times32$.






                    share|improve this answer






















                    • Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
                      – Mihkel L.
                      2 hours ago










                    • I edited my answer :)
                      – André
                      1 hour ago










                    • Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side of g.
                      – Mihkel L.
                      37 mins ago











                    • g could be just adding the 192 results together or should there be more smarts behind it?
                      – Mihkel L.
                      30 mins ago












                    up vote
                    1
                    down vote










                    up vote
                    1
                    down vote









                    In your example you use one filter, the video suggests to instead use 32 filters.



                    Let's take a deeper look: Each 1x1 convolutional filter performs actions only locally, taking into account one pixel, with 192 channels. Its output will be one value for each pixel, thus effectively reducing the dimension to 28x28x1. Now you don't only want to have this one channel, but instead 32 - the solution to that problem is to simply take 32 filters, where each channel-dimension in the resulting 28x28x32 matrix corresponds to one 1x1 conv. filter.



                    Mathematically: Let $MinmathbbR^28times28times192$ be the pixel Matrix. Now we can define a 1x1 convolutional filter, let's call it $phi:mathbbR^28times28times192tomathbbR^28times28times1$, with $phi:x_i,jmapsto g(x_i,j)$ for a $g:mathbbR^192tomathbbR$. I do not think $g$ needs to be linear, but I guess it depends on how you want to see the convolution - I would not assume linearity. Consequently we can now define a $psi:mathbbR^28times28times192tomathbbR^28times28times32$ using $psi:xmapstoleft(phi_1(x), dots, phi_32(x)right)^T$. If we now apply $psi$ on $M$, we will achieve the desired dimensionality reduction to $28times28times32$.






                    share|improve this answer














                    In your example you use one filter, the video suggests to instead use 32 filters.



                    Let's take a deeper look: Each 1x1 convolutional filter performs actions only locally, taking into account one pixel, with 192 channels. Its output will be one value for each pixel, thus effectively reducing the dimension to 28x28x1. Now you don't only want to have this one channel, but instead 32 - the solution to that problem is to simply take 32 filters, where each channel-dimension in the resulting 28x28x32 matrix corresponds to one 1x1 conv. filter.



                    Mathematically: Let $MinmathbbR^28times28times192$ be the pixel Matrix. Now we can define a 1x1 convolutional filter, let's call it $phi:mathbbR^28times28times192tomathbbR^28times28times1$, with $phi:x_i,jmapsto g(x_i,j)$ for a $g:mathbbR^192tomathbbR$. I do not think $g$ needs to be linear, but I guess it depends on how you want to see the convolution - I would not assume linearity. Consequently we can now define a $psi:mathbbR^28times28times192tomathbbR^28times28times32$ using $psi:xmapstoleft(phi_1(x), dots, phi_32(x)right)^T$. If we now apply $psi$ on $M$, we will achieve the desired dimensionality reduction to $28times28times32$.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited 1 hour ago

























                    answered 2 hours ago









                    André

                    4209




                    4209











                    • Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
                      – Mihkel L.
                      2 hours ago










                    • I edited my answer :)
                      – André
                      1 hour ago










                    • Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side of g.
                      – Mihkel L.
                      37 mins ago











                    • g could be just adding the 192 results together or should there be more smarts behind it?
                      – Mihkel L.
                      30 mins ago
















                    • Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
                      – Mihkel L.
                      2 hours ago










                    • I edited my answer :)
                      – André
                      1 hour ago










                    • Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side of g.
                      – Mihkel L.
                      37 mins ago











                    • g could be just adding the 192 results together or should there be more smarts behind it?
                      – Mihkel L.
                      30 mins ago















                    Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
                    – Mihkel L.
                    2 hours ago




                    Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
                    – Mihkel L.
                    2 hours ago












                    I edited my answer :)
                    – André
                    1 hour ago




                    I edited my answer :)
                    – André
                    1 hour ago












                    Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side of g.
                    – Mihkel L.
                    37 mins ago





                    Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side of g.
                    – Mihkel L.
                    37 mins ago













                    g could be just adding the 192 results together or should there be more smarts behind it?
                    – Mihkel L.
                    30 mins ago




                    g could be just adding the 192 results together or should there be more smarts behind it?
                    – Mihkel L.
                    30 mins ago










                    Mihkel L. is a new contributor. Be nice, and check out our Code of Conduct.









                     

                    draft saved


                    draft discarded


















                    Mihkel L. is a new contributor. Be nice, and check out our Code of Conduct.












                    Mihkel L. is a new contributor. Be nice, and check out our Code of Conduct.











                    Mihkel L. is a new contributor. Be nice, and check out our Code of Conduct.













                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38643%2f1x1-convolution-how-does-the-math-work%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Comments

                    Popular posts from this blog

                    Long meetings (6-7 hours a day): Being “babysat” by supervisor

                    What does second last employer means? [closed]

                    One-line joke