Is convolution neural network (CNN) a special case of multilayer perceptron (MLP)? And why not use MLP for everything?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
2
down vote

favorite

If convolution can be expressed with matrix multiplication (example) Can we say convolution neural network (CNN) is a special case of multilayer perceptron (MLP)?

If yes, why people do not use a big enough MLP for everything, that let the computer to learn to use the convolution by self?

edited Aug 26 at 6:45

asked Aug 26 at 6:37

hxd1011

16.9k443129

add a commentÂ |Â

up vote
2
down vote

favorite

If convolution can be expressed with matrix multiplication (example) Can we say convolution neural network (CNN) is a special case of multilayer perceptron (MLP)?

If yes, why people do not use a big enough MLP for everything, that let the computer to learn to use the convolution by self?

edited Aug 26 at 6:45

asked Aug 26 at 6:37

hxd1011

16.9k443129

add a commentÂ |Â

up vote
2
down vote

favorite

If convolution can be expressed with matrix multiplication (example) Can we say convolution neural network (CNN) is a special case of multilayer perceptron (MLP)?

If yes, why people do not use a big enough MLP for everything, that let the computer to learn to use the convolution by self?

edited Aug 26 at 6:45

asked Aug 26 at 6:37

hxd1011

16.9k443129

If convolution can be expressed with matrix multiplication (example) Can we say convolution neural network (CNN) is a special case of multilayer perceptron (MLP)?

If yes, why people do not use a big enough MLP for everything, that let the computer to learn to use the convolution by self?

edited Aug 26 at 6:45

asked Aug 26 at 6:37

hxd1011

16.9k443129

edited Aug 26 at 6:45

asked Aug 26 at 6:37

hxd1011

16.9k443129

asked Aug 26 at 6:37

hxd1011

16.9k443129

asked Aug 26 at 6:37

hxd1011

16.9k443129

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
2
down vote

A convolution can be expressed as matrix multiplication but the matrix is multiplied with a patch around every position in the image separately. So you go to (1/1) and extract a patch and multiply it with an MLP. Then you do the same thing at position (1/2) and so forth. So obviously there are less degrees of freedom than applying an MLP directly. Most people regard an MLP as a special case of a convolution where the spatial dimensions are 1x1.

Edit Start

Regarding MLP as special case of CNN, some comments do not share this opinion. Yann LeCun, who can be counted as one of the inventors of CNNs, made a similar comment before on FB: https://www.facebook.com/yann.lecun/posts/10152820758292143

He said that in CNNs there is no such thing as a "fully connected" layer, there is only a layer with 1x1 spatial extent and a kernel with 1x1 spatial extent. If one can "convert" FC layers, which are the single layers of MLPs into convolutional layers, then one can obviously also convert an entire MLP into a CNN by interpreting the input as a vector with only channel dimensions.

An example: If I have an image of size $Htimes Wtimes C$ ($C$ channels) and I apply a single layer of an MLP to it, then I will transform the input into a vector $x$ of size $V=HWC$. I will then apply a matrix $Win mathbbR^Utimes V$ to it, thereby creating $U$ hidden activations. I could interpret the input vector $x$ as an image with only one pixel but $V$ "channels": $xinmathbbR^1times 1times V$ and the weight matrix as a Kernel with only one pixel area but $U$ filters taking in $V$ channels each: $WinmathbbR^Utimes 1times 1times V$. I can then call some Conv2D function that carries out the operation and computes exactly the same as the MLP.

Edit End

If yes, why people do not use a big enough MLP for everything, that let the computer to learn to use the convolution by self?

That is a nice idea (and probably worth doing research on) but it's simply not practical:

The MLP has too many degrees of freedom, it's likely to overfit.

In addition to learning the weights, you would have to learn their dependency structure.

As most Deep Learning research is closely related to NLP/speech processing/computer vision, people are eager to solve their problems and maybe less eager to investigate how a function space more general than a CNN could constrain itself to that particular function space. Though imho it's certainly interesting to think about that.

edited Aug 26 at 15:54

answered Aug 26 at 8:19

jmaxx

20519

I am not sure if that is the case ("people regard an MLP as a special case of a convolution"). Another equally valid way of looking at it is that a CNN is a special case of a MLP where only local connections have a weight different from zero, and that the weights of local connections are shared. Definitely that is how I was introduced to the concept of CNNs after learning about fully-connected networks.
â€“Â Neil Slater
Aug 26 at 11:50

@NeilSlater I've edited the post and tried to clarify my view on that matter, I hope it's now easier to understand. If you could take the time to share your point of view, that would be very much appreciated, as I think the question is quite interesting. Thank you!
â€“Â jmaxx
Aug 26 at 15:56

Yann LeCun's comment also leads to this Q&A on Stack Exchange: datascience.stackexchange.com/questions/12830/â€¦
â€“Â Neil Slater
Aug 26 at 16:05

@NeilSlater Thanks for the link, I hadn't seen that there is an answer on another Stack Exchange site. As far as I can tell, that answer gives a similar perspective as mine in that it shows how any MLP can be computed as a $1times 1$ convolution, i.e. those two are equivalent.
â€“Â jmaxx
Aug 26 at 16:09

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f364001%2fis-convolution-neural-network-cnn-a-special-case-of-multilayer-perceptron-mlp%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

Edit Start

Edit End

If yes, why people do not use a big enough MLP for everything, that let the computer to learn to use the convolution by self?

That is a nice idea (and probably worth doing research on) but it's simply not practical:

The MLP has too many degrees of freedom, it's likely to overfit.

In addition to learning the weights, you would have to learn their dependency structure.

edited Aug 26 at 15:54

answered Aug 26 at 8:19

jmaxx

20519

I am not sure if that is the case ("people regard an MLP as a special case of a convolution"). Another equally valid way of looking at it is that a CNN is a special case of a MLP where only local connections have a weight different from zero, and that the weights of local connections are shared. Definitely that is how I was introduced to the concept of CNNs after learning about fully-connected networks.
â€“Â Neil Slater
Aug 26 at 11:50

@NeilSlater I've edited the post and tried to clarify my view on that matter, I hope it's now easier to understand. If you could take the time to share your point of view, that would be very much appreciated, as I think the question is quite interesting. Thank you!
â€“Â jmaxx
Aug 26 at 15:56

Yann LeCun's comment also leads to this Q&A on Stack Exchange: datascience.stackexchange.com/questions/12830/â€¦
â€“Â Neil Slater
Aug 26 at 16:05

@NeilSlater Thanks for the link, I hadn't seen that there is an answer on another Stack Exchange site. As far as I can tell, that answer gives a similar perspective as mine in that it shows how any MLP can be computed as a $1times 1$ convolution, i.e. those two are equivalent.
â€“Â jmaxx
Aug 26 at 16:09

add a commentÂ |Â

up vote
2
down vote

Edit Start

Edit End

If yes, why people do not use a big enough MLP for everything, that let the computer to learn to use the convolution by self?

That is a nice idea (and probably worth doing research on) but it's simply not practical:

The MLP has too many degrees of freedom, it's likely to overfit.

In addition to learning the weights, you would have to learn their dependency structure.

edited Aug 26 at 15:54

answered Aug 26 at 8:19

jmaxx

20519

I am not sure if that is the case ("people regard an MLP as a special case of a convolution"). Another equally valid way of looking at it is that a CNN is a special case of a MLP where only local connections have a weight different from zero, and that the weights of local connections are shared. Definitely that is how I was introduced to the concept of CNNs after learning about fully-connected networks.
â€“Â Neil Slater
Aug 26 at 11:50

@NeilSlater I've edited the post and tried to clarify my view on that matter, I hope it's now easier to understand. If you could take the time to share your point of view, that would be very much appreciated, as I think the question is quite interesting. Thank you!
â€“Â jmaxx
Aug 26 at 15:56

Yann LeCun's comment also leads to this Q&A on Stack Exchange: datascience.stackexchange.com/questions/12830/â€¦
â€“Â Neil Slater
Aug 26 at 16:05

@NeilSlater Thanks for the link, I hadn't seen that there is an answer on another Stack Exchange site. As far as I can tell, that answer gives a similar perspective as mine in that it shows how any MLP can be computed as a $1times 1$ convolution, i.e. those two are equivalent.
â€“Â jmaxx
Aug 26 at 16:09

add a commentÂ |Â

up vote
2
down vote

Edit Start

Edit End

If yes, why people do not use a big enough MLP for everything, that let the computer to learn to use the convolution by self?

That is a nice idea (and probably worth doing research on) but it's simply not practical:

The MLP has too many degrees of freedom, it's likely to overfit.

In addition to learning the weights, you would have to learn their dependency structure.

edited Aug 26 at 15:54

answered Aug 26 at 8:19

jmaxx

20519

Edit Start

Edit End

If yes, why people do not use a big enough MLP for everything, that let the computer to learn to use the convolution by self?

That is a nice idea (and probably worth doing research on) but it's simply not practical:

The MLP has too many degrees of freedom, it's likely to overfit.

In addition to learning the weights, you would have to learn their dependency structure.

edited Aug 26 at 15:54

answered Aug 26 at 8:19

jmaxx

20519

edited Aug 26 at 15:54

answered Aug 26 at 8:19

jmaxx

20519

answered Aug 26 at 8:19

jmaxx

20519

answered Aug 26 at 8:19

jmaxx

20519

I am not sure if that is the case ("people regard an MLP as a special case of a convolution"). Another equally valid way of looking at it is that a CNN is a special case of a MLP where only local connections have a weight different from zero, and that the weights of local connections are shared. Definitely that is how I was introduced to the concept of CNNs after learning about fully-connected networks.
â€“Â Neil Slater
Aug 26 at 11:50

@NeilSlater I've edited the post and tried to clarify my view on that matter, I hope it's now easier to understand. If you could take the time to share your point of view, that would be very much appreciated, as I think the question is quite interesting. Thank you!
â€“Â jmaxx
Aug 26 at 15:56

Yann LeCun's comment also leads to this Q&A on Stack Exchange: datascience.stackexchange.com/questions/12830/â€¦
â€“Â Neil Slater
Aug 26 at 16:05

@NeilSlater Thanks for the link, I hadn't seen that there is an answer on another Stack Exchange site. As far as I can tell, that answer gives a similar perspective as mine in that it shows how any MLP can be computed as a $1times 1$ convolution, i.e. those two are equivalent.
â€“Â jmaxx
Aug 26 at 16:09

add a commentÂ |Â

I am not sure if that is the case ("people regard an MLP as a special case of a convolution"). Another equally valid way of looking at it is that a CNN is a special case of a MLP where only local connections have a weight different from zero, and that the weights of local connections are shared. Definitely that is how I was introduced to the concept of CNNs after learning about fully-connected networks.
â€“Â Neil Slater
Aug 26 at 11:50

@NeilSlater I've edited the post and tried to clarify my view on that matter, I hope it's now easier to understand. If you could take the time to share your point of view, that would be very much appreciated, as I think the question is quite interesting. Thank you!
â€“Â jmaxx
Aug 26 at 15:56

Yann LeCun's comment also leads to this Q&A on Stack Exchange: datascience.stackexchange.com/questions/12830/â€¦
â€“Â Neil Slater
Aug 26 at 16:05

@NeilSlater Thanks for the link, I hadn't seen that there is an answer on another Stack Exchange site. As far as I can tell, that answer gives a similar perspective as mine in that it shows how any MLP can be computed as a $1times 1$ convolution, i.e. those two are equivalent.
â€“Â jmaxx
Aug 26 at 16:09

I am not sure if that is the case ("people regard an MLP as a special case of a convolution"). Another equally valid way of looking at it is that a CNN is a special case of a MLP where only local connections have a weight different from zero, and that the weights of local connections are shared. Definitely that is how I was introduced to the concept of CNNs after learning about fully-connected networks.
â€“Â Neil Slater
Aug 26 at 11:50

@NeilSlater I've edited the post and tried to clarify my view on that matter, I hope it's now easier to understand. If you could take the time to share your point of view, that would be very much appreciated, as I think the question is quite interesting. Thank you!
â€“Â jmaxx
Aug 26 at 15:56

Yann LeCun's comment also leads to this Q&A on Stack Exchange: datascience.stackexchange.com/questions/12830/â€¦
â€“Â Neil Slater
Aug 26 at 16:05

@NeilSlater Thanks for the link, I hadn't seen that there is an answer on another Stack Exchange site. As far as I can tell, that answer gives a similar perspective as mine in that it shows how any MLP can be computed as a $1times 1$ convolution, i.e. those two are equivalent.
â€“Â jmaxx
Aug 26 at 16:09

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky