L1 and L2 penalty vs L1 and L2 norms

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
1
down vote

favorite

I understand the usages of L1 and L2 norms however I am unsure of usage of L1 and L2 penalty when building models.

From what I understand:

L1: Laplace Prior L2: Gaussian Prior

are two of the penalty terms. I have tried to read about these but there is surprisingly no discussion on these, it always leads to Lasso and Ridge, which I understand.

Can someone help me bridge the gap so as to what does these refer to? and if they are related to L1 and L2 norms in the end, how?

Thanks for your help.

asked 4 hours ago

power.puffed

New contributor

Could you be even more explicit regarding what you do not understand? When you refer to these, do you mean penalty terms or priors, or what?
â€“Â Richard Hardy
4 hours ago

Hi @RichardHardy, I meant I wanted to know more about LI and L2 penalty and how are they different from L1 and L2 norms
â€“Â power.puffed
2 hours ago

add a commentÂ |Â

up vote
1
down vote

favorite

I understand the usages of L1 and L2 norms however I am unsure of usage of L1 and L2 penalty when building models.

From what I understand:

L1: Laplace Prior L2: Gaussian Prior

are two of the penalty terms. I have tried to read about these but there is surprisingly no discussion on these, it always leads to Lasso and Ridge, which I understand.

Can someone help me bridge the gap so as to what does these refer to? and if they are related to L1 and L2 norms in the end, how?

Thanks for your help.

asked 4 hours ago

power.puffed

New contributor

Could you be even more explicit regarding what you do not understand? When you refer to these, do you mean penalty terms or priors, or what?
â€“Â Richard Hardy
4 hours ago

Hi @RichardHardy, I meant I wanted to know more about LI and L2 penalty and how are they different from L1 and L2 norms
â€“Â power.puffed
2 hours ago

add a commentÂ |Â

up vote
1
down vote

favorite

I understand the usages of L1 and L2 norms however I am unsure of usage of L1 and L2 penalty when building models.

From what I understand:

L1: Laplace Prior L2: Gaussian Prior

are two of the penalty terms. I have tried to read about these but there is surprisingly no discussion on these, it always leads to Lasso and Ridge, which I understand.

Can someone help me bridge the gap so as to what does these refer to? and if they are related to L1 and L2 norms in the end, how?

Thanks for your help.

asked 4 hours ago

power.puffed

New contributor

I understand the usages of L1 and L2 norms however I am unsure of usage of L1 and L2 penalty when building models.

From what I understand:

L1: Laplace Prior L2: Gaussian Prior

are two of the penalty terms. I have tried to read about these but there is surprisingly no discussion on these, it always leads to Lasso and Ridge, which I understand.

Can someone help me bridge the gap so as to what does these refer to? and if they are related to L1 and L2 norms in the end, how?

Thanks for your help.

regularization

asked 4 hours ago

power.puffed

New contributor

asked 4 hours ago

power.puffed

New contributor

asked 4 hours ago

power.puffed

New contributor

asked 4 hours ago

power.puffed

asked 4 hours ago

power.puffed

New contributor

power.puffed is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Could you be even more explicit regarding what you do not understand? When you refer to these, do you mean penalty terms or priors, or what?
â€“Â Richard Hardy
4 hours ago

Hi @RichardHardy, I meant I wanted to know more about LI and L2 penalty and how are they different from L1 and L2 norms
â€“Â power.puffed
2 hours ago

add a commentÂ |Â

Could you be even more explicit regarding what you do not understand? When you refer to these, do you mean penalty terms or priors, or what?
â€“Â Richard Hardy
4 hours ago

Hi @RichardHardy, I meant I wanted to know more about LI and L2 penalty and how are they different from L1 and L2 norms
â€“Â power.puffed
2 hours ago

Could you be even more explicit regarding what you do not understand? When you refer to these, do you mean penalty terms or priors, or what?
â€“Â Richard Hardy
4 hours ago

Hi @RichardHardy, I meant I wanted to know more about LI and L2 penalty and how are they different from L1 and L2 norms
â€“Â power.puffed
2 hours ago

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
3
down vote

Norm in mathematics is some function that measures "length" or "size" of a vector. Among the popular norms, there are $L_1$, $L_2$ and $L_p$ norms defined as

In machine learning, we often want to predict some $y$ using some function $f$ of $mathbfx$ parametrized by a vector of parameters $boldsymbolbeta$. To achieve this, we minimize loss function $mathcalL$. We sometimes want to penalize the parameters, by forcing them to have smaller values, the rationale for this is described, for example here, here, or here. One of the ways of achieving this, is by adding the regularization terms, e.g. $L_2$ norm of the vector of weights, and minimizing the whole thing

$$
undersetboldsymbolbetaoperatornamearg,min ; mathcalLbig(y, ,f(mathbfx; boldsymbolbeta) big) + lambda, |,boldsymbolbeta, |_2
$$

where $lambdage0$ is a hyperparameter. So basically, we use the norms in here to measure the "size" of the model weights. By adding the size of the weights to the loss function, we force the minimization algorithm to seek for such solution that along with minimizing the loss function, would make the "size" of weights smaller. The $lambda$ hyperparameter lets you control how large effect this should have on the optimization algorithm.

Indeed, using $L_2$ as penalty may be seen as equivalent of using Gaussian priors for the parameters, while using $L_1$ norm would be equivalent of using Laplace priors (but in practice, you need much stronger priors, check e.g. the paper Shrinkage priors for Bayesian penalized regression by van Erp et al).

For more details check e.g. Why L1 norm for sparse models, Why does the Lasso provide Variable Selection?, or When should I use lasso vs ridge? threads.

edited 33 mins ago

answered 1 hour ago

Timâ™¦

54k9122206

I think OP would also benefit from a discussion of the distinction between mse and mae minimization
â€“Â generic_user
1 hour ago

@generic_user this was already described in a number of places on this site, I gave several links that discuss this.
â€“Â Timâ™¦
1 hour ago

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

power.puffed is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f375949%2fl1-and-l2-penalty-vs-l1-and-l2-norms%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

Norm in mathematics is some function that measures "length" or "size" of a vector. Among the popular norms, there are $L_1$, $L_2$ and $L_p$ norms defined as

$$
undersetboldsymbolbetaoperatornamearg,min ; mathcalLbig(y, ,f(mathbfx; boldsymbolbeta) big) + lambda, |,boldsymbolbeta, |_2
$$

For more details check e.g. Why L1 norm for sparse models, Why does the Lasso provide Variable Selection?, or When should I use lasso vs ridge? threads.

edited 33 mins ago

answered 1 hour ago

Timâ™¦

54k9122206

I think OP would also benefit from a discussion of the distinction between mse and mae minimization
â€“Â generic_user
1 hour ago

@generic_user this was already described in a number of places on this site, I gave several links that discuss this.
â€“Â Timâ™¦
1 hour ago

add a commentÂ |Â

up vote
3
down vote

Norm in mathematics is some function that measures "length" or "size" of a vector. Among the popular norms, there are $L_1$, $L_2$ and $L_p$ norms defined as

$$
undersetboldsymbolbetaoperatornamearg,min ; mathcalLbig(y, ,f(mathbfx; boldsymbolbeta) big) + lambda, |,boldsymbolbeta, |_2
$$

For more details check e.g. Why L1 norm for sparse models, Why does the Lasso provide Variable Selection?, or When should I use lasso vs ridge? threads.

edited 33 mins ago

answered 1 hour ago

Timâ™¦

54k9122206

I think OP would also benefit from a discussion of the distinction between mse and mae minimization
â€“Â generic_user
1 hour ago

@generic_user this was already described in a number of places on this site, I gave several links that discuss this.
â€“Â Timâ™¦
1 hour ago

add a commentÂ |Â

up vote
3
down vote

Norm in mathematics is some function that measures "length" or "size" of a vector. Among the popular norms, there are $L_1$, $L_2$ and $L_p$ norms defined as

$$
undersetboldsymbolbetaoperatornamearg,min ; mathcalLbig(y, ,f(mathbfx; boldsymbolbeta) big) + lambda, |,boldsymbolbeta, |_2
$$

For more details check e.g. Why L1 norm for sparse models, Why does the Lasso provide Variable Selection?, or When should I use lasso vs ridge? threads.

edited 33 mins ago

answered 1 hour ago

Timâ™¦

54k9122206

Norm in mathematics is some function that measures "length" or "size" of a vector. Among the popular norms, there are $L_1$, $L_2$ and $L_p$ norms defined as

$$
undersetboldsymbolbetaoperatornamearg,min ; mathcalLbig(y, ,f(mathbfx; boldsymbolbeta) big) + lambda, |,boldsymbolbeta, |_2
$$

For more details check e.g. Why L1 norm for sparse models, Why does the Lasso provide Variable Selection?, or When should I use lasso vs ridge? threads.

edited 33 mins ago

answered 1 hour ago

Timâ™¦

54k9122206

edited 33 mins ago

answered 1 hour ago

Timâ™¦

54k9122206

answered 1 hour ago

Timâ™¦

54k9122206

answered 1 hour ago

Timâ™¦

54k9122206

I think OP would also benefit from a discussion of the distinction between mse and mae minimization
â€“Â generic_user
1 hour ago

@generic_user this was already described in a number of places on this site, I gave several links that discuss this.
â€“Â Timâ™¦
1 hour ago

add a commentÂ |Â

I think OP would also benefit from a discussion of the distinction between mse and mae minimization
â€“Â generic_user
1 hour ago

@generic_user this was already described in a number of places on this site, I gave several links that discuss this.
â€“Â Timâ™¦
1 hour ago

I think OP would also benefit from a discussion of the distinction between mse and mae minimization
â€“Â generic_user
1 hour ago

@generic_user this was already described in a number of places on this site, I gave several links that discuss this.
â€“Â Timâ™¦
1 hour ago

add a commentÂ |Â

power.puffed is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

power.puffed is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky