What Base Should Be Used For Negative Log Likelihood?
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
When calculating the negative log likelihood loss, what base of log are we supposed to use?
machine-learning loss-function
New contributor
add a comment |Â
up vote
1
down vote
favorite
When calculating the negative log likelihood loss, what base of log are we supposed to use?
machine-learning loss-function
New contributor
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
When calculating the negative log likelihood loss, what base of log are we supposed to use?
machine-learning loss-function
New contributor
When calculating the negative log likelihood loss, what base of log are we supposed to use?
machine-learning loss-function
machine-learning loss-function
New contributor
New contributor
edited 11 mins ago
duckmayr
1032
1032
New contributor
asked 8 hours ago
Brandon Lavigne
1083
1083
New contributor
New contributor
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
4
down vote
accepted
Typically it is implemented as the natural logarithm, base e. Other bases can be used for the same effect though.
1
(+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
â duckmayr
2 hours ago
add a comment |Â
up vote
3
down vote
The change in base is equivalent to multiplying the function by a constant. It does not affect the computation.
$
log_b(x) = dfrac1log_e(b).log_e(x)
$
add a comment |Â
up vote
0
down vote
Generally, when the log likelihood is being calculated, it's being done as a loss function, that is, an amount that is being optimized. Changing the base multiplies the log by a constant. As long as the bases are either both greater than one, or both less than one, this constant is positive (note that "negative log likelihood" can be interpreted as taking the log base a number less than one), and multiplying a function by a constant greater than one doesn't affect what inputs optimize the value of that function. In other words, it doesn't matter. Changing the base basically is a change of units: the log base $2$ is units of bits, log base $256$ is units of bytes, log base $e$ is units of nits. So it's like asking "Okay, we're trying to minimize the amount of wire that we're using ... but are we minimizing the amount of wire in feet, or the amount of wire in meters?"
The natural base $e$ is often used because it makes some of the math easier, but the base $2$ is also used in some contexts because it allows reporting the log in the units of bits. In cases where the absolute, rather relative, value of log likelihood is important, the base should be indicated either by explicitly naming the base or giving the units (e.g. bits, nits, etc.).
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
Typically it is implemented as the natural logarithm, base e. Other bases can be used for the same effect though.
1
(+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
â duckmayr
2 hours ago
add a comment |Â
up vote
4
down vote
accepted
Typically it is implemented as the natural logarithm, base e. Other bases can be used for the same effect though.
1
(+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
â duckmayr
2 hours ago
add a comment |Â
up vote
4
down vote
accepted
up vote
4
down vote
accepted
Typically it is implemented as the natural logarithm, base e. Other bases can be used for the same effect though.
Typically it is implemented as the natural logarithm, base e. Other bases can be used for the same effect though.
answered 5 hours ago
JahKnows
4,261423
4,261423
1
(+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
â duckmayr
2 hours ago
add a comment |Â
1
(+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
â duckmayr
2 hours ago
1
1
(+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
â duckmayr
2 hours ago
(+1) for "Typically it is implemented as the natural logarithm, base e. Other bases can be used". However "for the same effect" may be slightly misleading -- there's a reason the natural logarithm is usually used: For many distributions, it makes the math convenient. Using some other base, while convenient in some cases, would not be as convenient as often as the natural logarithm.
â duckmayr
2 hours ago
add a comment |Â
up vote
3
down vote
The change in base is equivalent to multiplying the function by a constant. It does not affect the computation.
$
log_b(x) = dfrac1log_e(b).log_e(x)
$
add a comment |Â
up vote
3
down vote
The change in base is equivalent to multiplying the function by a constant. It does not affect the computation.
$
log_b(x) = dfrac1log_e(b).log_e(x)
$
add a comment |Â
up vote
3
down vote
up vote
3
down vote
The change in base is equivalent to multiplying the function by a constant. It does not affect the computation.
$
log_b(x) = dfrac1log_e(b).log_e(x)
$
The change in base is equivalent to multiplying the function by a constant. It does not affect the computation.
$
log_b(x) = dfrac1log_e(b).log_e(x)
$
answered 5 hours ago
Anshul G.
30617
30617
add a comment |Â
add a comment |Â
up vote
0
down vote
Generally, when the log likelihood is being calculated, it's being done as a loss function, that is, an amount that is being optimized. Changing the base multiplies the log by a constant. As long as the bases are either both greater than one, or both less than one, this constant is positive (note that "negative log likelihood" can be interpreted as taking the log base a number less than one), and multiplying a function by a constant greater than one doesn't affect what inputs optimize the value of that function. In other words, it doesn't matter. Changing the base basically is a change of units: the log base $2$ is units of bits, log base $256$ is units of bytes, log base $e$ is units of nits. So it's like asking "Okay, we're trying to minimize the amount of wire that we're using ... but are we minimizing the amount of wire in feet, or the amount of wire in meters?"
The natural base $e$ is often used because it makes some of the math easier, but the base $2$ is also used in some contexts because it allows reporting the log in the units of bits. In cases where the absolute, rather relative, value of log likelihood is important, the base should be indicated either by explicitly naming the base or giving the units (e.g. bits, nits, etc.).
add a comment |Â
up vote
0
down vote
Generally, when the log likelihood is being calculated, it's being done as a loss function, that is, an amount that is being optimized. Changing the base multiplies the log by a constant. As long as the bases are either both greater than one, or both less than one, this constant is positive (note that "negative log likelihood" can be interpreted as taking the log base a number less than one), and multiplying a function by a constant greater than one doesn't affect what inputs optimize the value of that function. In other words, it doesn't matter. Changing the base basically is a change of units: the log base $2$ is units of bits, log base $256$ is units of bytes, log base $e$ is units of nits. So it's like asking "Okay, we're trying to minimize the amount of wire that we're using ... but are we minimizing the amount of wire in feet, or the amount of wire in meters?"
The natural base $e$ is often used because it makes some of the math easier, but the base $2$ is also used in some contexts because it allows reporting the log in the units of bits. In cases where the absolute, rather relative, value of log likelihood is important, the base should be indicated either by explicitly naming the base or giving the units (e.g. bits, nits, etc.).
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Generally, when the log likelihood is being calculated, it's being done as a loss function, that is, an amount that is being optimized. Changing the base multiplies the log by a constant. As long as the bases are either both greater than one, or both less than one, this constant is positive (note that "negative log likelihood" can be interpreted as taking the log base a number less than one), and multiplying a function by a constant greater than one doesn't affect what inputs optimize the value of that function. In other words, it doesn't matter. Changing the base basically is a change of units: the log base $2$ is units of bits, log base $256$ is units of bytes, log base $e$ is units of nits. So it's like asking "Okay, we're trying to minimize the amount of wire that we're using ... but are we minimizing the amount of wire in feet, or the amount of wire in meters?"
The natural base $e$ is often used because it makes some of the math easier, but the base $2$ is also used in some contexts because it allows reporting the log in the units of bits. In cases where the absolute, rather relative, value of log likelihood is important, the base should be indicated either by explicitly naming the base or giving the units (e.g. bits, nits, etc.).
Generally, when the log likelihood is being calculated, it's being done as a loss function, that is, an amount that is being optimized. Changing the base multiplies the log by a constant. As long as the bases are either both greater than one, or both less than one, this constant is positive (note that "negative log likelihood" can be interpreted as taking the log base a number less than one), and multiplying a function by a constant greater than one doesn't affect what inputs optimize the value of that function. In other words, it doesn't matter. Changing the base basically is a change of units: the log base $2$ is units of bits, log base $256$ is units of bytes, log base $e$ is units of nits. So it's like asking "Okay, we're trying to minimize the amount of wire that we're using ... but are we minimizing the amount of wire in feet, or the amount of wire in meters?"
The natural base $e$ is often used because it makes some of the math easier, but the base $2$ is also used in some contexts because it allows reporting the log in the units of bits. In cases where the absolute, rather relative, value of log likelihood is important, the base should be indicated either by explicitly naming the base or giving the units (e.g. bits, nits, etc.).
answered 13 mins ago
Acccumulation
1211
1211
add a comment |Â
add a comment |Â
Brandon Lavigne is a new contributor. Be nice, and check out our Code of Conduct.
Brandon Lavigne is a new contributor. Be nice, and check out our Code of Conduct.
Brandon Lavigne is a new contributor. Be nice, and check out our Code of Conduct.
Brandon Lavigne is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40650%2fwhat-base-should-be-used-for-negative-log-likelihood%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password