1x1 Convolution. How does the math work?
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
So I stumbled upon Andrew Ng course on 1x1 convolutions.
There he explains that you can use 1x1x192 convolution to shrink it.
But when I do
input_ = torch.randn([28, 28, 192])
filter = torch.zeros([1, 1, 192])
out = torch.mul(input_,filter)
I obviously get 28x28x192
matrix. So how should I be able to shrink it?
Just add the result of every 1x1x192 * 1x1x192
kerner result? So I'd get 28x28x1
matrix?
convnet
New contributor
add a comment |Â
up vote
2
down vote
favorite
So I stumbled upon Andrew Ng course on 1x1 convolutions.
There he explains that you can use 1x1x192 convolution to shrink it.
But when I do
input_ = torch.randn([28, 28, 192])
filter = torch.zeros([1, 1, 192])
out = torch.mul(input_,filter)
I obviously get 28x28x192
matrix. So how should I be able to shrink it?
Just add the result of every 1x1x192 * 1x1x192
kerner result? So I'd get 28x28x1
matrix?
convnet
New contributor
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
So I stumbled upon Andrew Ng course on 1x1 convolutions.
There he explains that you can use 1x1x192 convolution to shrink it.
But when I do
input_ = torch.randn([28, 28, 192])
filter = torch.zeros([1, 1, 192])
out = torch.mul(input_,filter)
I obviously get 28x28x192
matrix. So how should I be able to shrink it?
Just add the result of every 1x1x192 * 1x1x192
kerner result? So I'd get 28x28x1
matrix?
convnet
New contributor
So I stumbled upon Andrew Ng course on 1x1 convolutions.
There he explains that you can use 1x1x192 convolution to shrink it.
But when I do
input_ = torch.randn([28, 28, 192])
filter = torch.zeros([1, 1, 192])
out = torch.mul(input_,filter)
I obviously get 28x28x192
matrix. So how should I be able to shrink it?
Just add the result of every 1x1x192 * 1x1x192
kerner result? So I'd get 28x28x1
matrix?
convnet
convnet
New contributor
New contributor
New contributor
asked 3 hours ago
Mihkel L.
1111
1111
New contributor
New contributor
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
Let's go back at normal convolution: let's say you have a 28x28x3 image (3 = R,G,B).
I don't use torch, but keras, but the principle applies I think.
When you apply a 2D Convolution, passing the size of the filter, for example 3x3, the framework adapt your filter from 3x3 to 3x3x3! Where the last 3 it's due to the dept of the image.
The same happens when, after a first layer of convolution with 100 filters, you obtain an image of size 28x28x100, at the second convolution layer you decide only the first two dimension of the filter, let's say 4x4. The framework instead, applies a filter of dimension 4x4x100!
So, to reply at your question, if you apply 1x1 convolution to 28x28x100, passing number of filters of k. You obtain an activation map (result) of dimension 28x28xk.
And that's the shrink suggested by Ng.
Again to fully reply to your question, the mat is simple, just apply the theory of the convolution using 3D filters. Sum of multiplication of overlapping elements between filter and image.
New contributor
add a comment |Â
up vote
1
down vote
In your example you use one filter, the video suggests to instead use 32 filters.
Let's take a deeper look: Each 1x1 convolutional filter performs actions only locally, taking into account one pixel, with 192 channels. Its output will be one value for each pixel, thus effectively reducing the dimension to 28x28x1. Now you don't only want to have this one channel, but instead 32 - the solution to that problem is to simply take 32 filters, where each channel-dimension in the resulting 28x28x32 matrix corresponds to one 1x1 conv. filter.
Mathematically: Let $MinmathbbR^28times28times192$ be the pixel Matrix. Now we can define a 1x1 convolutional filter, let's call it $phi:mathbbR^28times28times192tomathbbR^28times28times1$, with $phi:x_i,jmapsto g(x_i,j)$ for a $g:mathbbR^192tomathbbR$. I do not think $g$ needs to be linear, but I guess it depends on how you want to see the convolution - I would not assume linearity. Consequently we can now define a $psi:mathbbR^28times28times192tomathbbR^28times28times32$ using $psi:xmapstoleft(phi_1(x), dots, phi_32(x)right)^T$. If we now apply $psi$ on $M$, we will achieve the desired dimensionality reduction to $28times28times32$.
Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
â Mihkel L.
2 hours ago
I edited my answer :)
â André
1 hour ago
Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side ofg
.
â Mihkel L.
37 mins ago
g
could be just adding the 192 results together or should there be more smarts behind it?
â Mihkel L.
30 mins ago
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
Let's go back at normal convolution: let's say you have a 28x28x3 image (3 = R,G,B).
I don't use torch, but keras, but the principle applies I think.
When you apply a 2D Convolution, passing the size of the filter, for example 3x3, the framework adapt your filter from 3x3 to 3x3x3! Where the last 3 it's due to the dept of the image.
The same happens when, after a first layer of convolution with 100 filters, you obtain an image of size 28x28x100, at the second convolution layer you decide only the first two dimension of the filter, let's say 4x4. The framework instead, applies a filter of dimension 4x4x100!
So, to reply at your question, if you apply 1x1 convolution to 28x28x100, passing number of filters of k. You obtain an activation map (result) of dimension 28x28xk.
And that's the shrink suggested by Ng.
Again to fully reply to your question, the mat is simple, just apply the theory of the convolution using 3D filters. Sum of multiplication of overlapping elements between filter and image.
New contributor
add a comment |Â
up vote
2
down vote
Let's go back at normal convolution: let's say you have a 28x28x3 image (3 = R,G,B).
I don't use torch, but keras, but the principle applies I think.
When you apply a 2D Convolution, passing the size of the filter, for example 3x3, the framework adapt your filter from 3x3 to 3x3x3! Where the last 3 it's due to the dept of the image.
The same happens when, after a first layer of convolution with 100 filters, you obtain an image of size 28x28x100, at the second convolution layer you decide only the first two dimension of the filter, let's say 4x4. The framework instead, applies a filter of dimension 4x4x100!
So, to reply at your question, if you apply 1x1 convolution to 28x28x100, passing number of filters of k. You obtain an activation map (result) of dimension 28x28xk.
And that's the shrink suggested by Ng.
Again to fully reply to your question, the mat is simple, just apply the theory of the convolution using 3D filters. Sum of multiplication of overlapping elements between filter and image.
New contributor
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Let's go back at normal convolution: let's say you have a 28x28x3 image (3 = R,G,B).
I don't use torch, but keras, but the principle applies I think.
When you apply a 2D Convolution, passing the size of the filter, for example 3x3, the framework adapt your filter from 3x3 to 3x3x3! Where the last 3 it's due to the dept of the image.
The same happens when, after a first layer of convolution with 100 filters, you obtain an image of size 28x28x100, at the second convolution layer you decide only the first two dimension of the filter, let's say 4x4. The framework instead, applies a filter of dimension 4x4x100!
So, to reply at your question, if you apply 1x1 convolution to 28x28x100, passing number of filters of k. You obtain an activation map (result) of dimension 28x28xk.
And that's the shrink suggested by Ng.
Again to fully reply to your question, the mat is simple, just apply the theory of the convolution using 3D filters. Sum of multiplication of overlapping elements between filter and image.
New contributor
Let's go back at normal convolution: let's say you have a 28x28x3 image (3 = R,G,B).
I don't use torch, but keras, but the principle applies I think.
When you apply a 2D Convolution, passing the size of the filter, for example 3x3, the framework adapt your filter from 3x3 to 3x3x3! Where the last 3 it's due to the dept of the image.
The same happens when, after a first layer of convolution with 100 filters, you obtain an image of size 28x28x100, at the second convolution layer you decide only the first two dimension of the filter, let's say 4x4. The framework instead, applies a filter of dimension 4x4x100!
So, to reply at your question, if you apply 1x1 convolution to 28x28x100, passing number of filters of k. You obtain an activation map (result) of dimension 28x28xk.
And that's the shrink suggested by Ng.
Again to fully reply to your question, the mat is simple, just apply the theory of the convolution using 3D filters. Sum of multiplication of overlapping elements between filter and image.
New contributor
New contributor
answered 1 hour ago
Francesco Pegoraro
765
765
New contributor
New contributor
add a comment |Â
add a comment |Â
up vote
1
down vote
In your example you use one filter, the video suggests to instead use 32 filters.
Let's take a deeper look: Each 1x1 convolutional filter performs actions only locally, taking into account one pixel, with 192 channels. Its output will be one value for each pixel, thus effectively reducing the dimension to 28x28x1. Now you don't only want to have this one channel, but instead 32 - the solution to that problem is to simply take 32 filters, where each channel-dimension in the resulting 28x28x32 matrix corresponds to one 1x1 conv. filter.
Mathematically: Let $MinmathbbR^28times28times192$ be the pixel Matrix. Now we can define a 1x1 convolutional filter, let's call it $phi:mathbbR^28times28times192tomathbbR^28times28times1$, with $phi:x_i,jmapsto g(x_i,j)$ for a $g:mathbbR^192tomathbbR$. I do not think $g$ needs to be linear, but I guess it depends on how you want to see the convolution - I would not assume linearity. Consequently we can now define a $psi:mathbbR^28times28times192tomathbbR^28times28times32$ using $psi:xmapstoleft(phi_1(x), dots, phi_32(x)right)^T$. If we now apply $psi$ on $M$, we will achieve the desired dimensionality reduction to $28times28times32$.
Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
â Mihkel L.
2 hours ago
I edited my answer :)
â André
1 hour ago
Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side ofg
.
â Mihkel L.
37 mins ago
g
could be just adding the 192 results together or should there be more smarts behind it?
â Mihkel L.
30 mins ago
add a comment |Â
up vote
1
down vote
In your example you use one filter, the video suggests to instead use 32 filters.
Let's take a deeper look: Each 1x1 convolutional filter performs actions only locally, taking into account one pixel, with 192 channels. Its output will be one value for each pixel, thus effectively reducing the dimension to 28x28x1. Now you don't only want to have this one channel, but instead 32 - the solution to that problem is to simply take 32 filters, where each channel-dimension in the resulting 28x28x32 matrix corresponds to one 1x1 conv. filter.
Mathematically: Let $MinmathbbR^28times28times192$ be the pixel Matrix. Now we can define a 1x1 convolutional filter, let's call it $phi:mathbbR^28times28times192tomathbbR^28times28times1$, with $phi:x_i,jmapsto g(x_i,j)$ for a $g:mathbbR^192tomathbbR$. I do not think $g$ needs to be linear, but I guess it depends on how you want to see the convolution - I would not assume linearity. Consequently we can now define a $psi:mathbbR^28times28times192tomathbbR^28times28times32$ using $psi:xmapstoleft(phi_1(x), dots, phi_32(x)right)^T$. If we now apply $psi$ on $M$, we will achieve the desired dimensionality reduction to $28times28times32$.
Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
â Mihkel L.
2 hours ago
I edited my answer :)
â André
1 hour ago
Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side ofg
.
â Mihkel L.
37 mins ago
g
could be just adding the 192 results together or should there be more smarts behind it?
â Mihkel L.
30 mins ago
add a comment |Â
up vote
1
down vote
up vote
1
down vote
In your example you use one filter, the video suggests to instead use 32 filters.
Let's take a deeper look: Each 1x1 convolutional filter performs actions only locally, taking into account one pixel, with 192 channels. Its output will be one value for each pixel, thus effectively reducing the dimension to 28x28x1. Now you don't only want to have this one channel, but instead 32 - the solution to that problem is to simply take 32 filters, where each channel-dimension in the resulting 28x28x32 matrix corresponds to one 1x1 conv. filter.
Mathematically: Let $MinmathbbR^28times28times192$ be the pixel Matrix. Now we can define a 1x1 convolutional filter, let's call it $phi:mathbbR^28times28times192tomathbbR^28times28times1$, with $phi:x_i,jmapsto g(x_i,j)$ for a $g:mathbbR^192tomathbbR$. I do not think $g$ needs to be linear, but I guess it depends on how you want to see the convolution - I would not assume linearity. Consequently we can now define a $psi:mathbbR^28times28times192tomathbbR^28times28times32$ using $psi:xmapstoleft(phi_1(x), dots, phi_32(x)right)^T$. If we now apply $psi$ on $M$, we will achieve the desired dimensionality reduction to $28times28times32$.
In your example you use one filter, the video suggests to instead use 32 filters.
Let's take a deeper look: Each 1x1 convolutional filter performs actions only locally, taking into account one pixel, with 192 channels. Its output will be one value for each pixel, thus effectively reducing the dimension to 28x28x1. Now you don't only want to have this one channel, but instead 32 - the solution to that problem is to simply take 32 filters, where each channel-dimension in the resulting 28x28x32 matrix corresponds to one 1x1 conv. filter.
Mathematically: Let $MinmathbbR^28times28times192$ be the pixel Matrix. Now we can define a 1x1 convolutional filter, let's call it $phi:mathbbR^28times28times192tomathbbR^28times28times1$, with $phi:x_i,jmapsto g(x_i,j)$ for a $g:mathbbR^192tomathbbR$. I do not think $g$ needs to be linear, but I guess it depends on how you want to see the convolution - I would not assume linearity. Consequently we can now define a $psi:mathbbR^28times28times192tomathbbR^28times28times32$ using $psi:xmapstoleft(phi_1(x), dots, phi_32(x)right)^T$. If we now apply $psi$ on $M$, we will achieve the desired dimensionality reduction to $28times28times32$.
edited 1 hour ago
answered 2 hours ago
André
4209
4209
Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
â Mihkel L.
2 hours ago
I edited my answer :)
â André
1 hour ago
Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side ofg
.
â Mihkel L.
37 mins ago
g
could be just adding the 192 results together or should there be more smarts behind it?
â Mihkel L.
30 mins ago
add a comment |Â
Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
â Mihkel L.
2 hours ago
I edited my answer :)
â André
1 hour ago
Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side ofg
.
â Mihkel L.
37 mins ago
g
could be just adding the 192 results together or should there be more smarts behind it?
â Mihkel L.
30 mins ago
Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
â Mihkel L.
2 hours ago
Can you show me the math on this. Because Obviously I don't understand the wording on this problem. =D
â Mihkel L.
2 hours ago
I edited my answer :)
â André
1 hour ago
I edited my answer :)
â André
1 hour ago
Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side of
g
.â Mihkel L.
37 mins ago
Okey, I'll show you my thinking. When I multiply 1x1x192 with 1x1x192 I get 1x1x192. I'm not seeing how can I multiply 1x1x192 matrix with 1x1 and not get the third dimension to be 192. The desired math you showed is what I want. But I don't get what's in side of
g
.â Mihkel L.
37 mins ago
g
could be just adding the 192 results together or should there be more smarts behind it?â Mihkel L.
30 mins ago
g
could be just adding the 192 results together or should there be more smarts behind it?â Mihkel L.
30 mins ago
add a comment |Â
Mihkel L. is a new contributor. Be nice, and check out our Code of Conduct.
Mihkel L. is a new contributor. Be nice, and check out our Code of Conduct.
Mihkel L. is a new contributor. Be nice, and check out our Code of Conduct.
Mihkel L. is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38643%2f1x1-convolution-how-does-the-math-work%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password